Text Size
Tuesday, May 21, 2019

# Statistical Correlation Does Not Always Prove Cause

## Black Belt or not, common sense is quite rare

After my last column citing some really bizarre flaws in how our brains perceive reality, I thought I might cover some flaws in logic that are applicable in the world of quality. So, basically, even if our brains are working correctly, we can still send our Black Belts off on false trails trying to solve problems, thus offering more proof (as if we need it) for Voltaire’s observation that “common sense is quite rare.”

I ran across this very cool site organizing logical fallacies into a taxonomy. (OK, so the Internet empowers my nerdosity….) Now as you know, our work in quality is not pure logic (i.e., what is consistent) but science (i.e., what works). Ptolomy was logical when he said the Earth does not move because if it did:

“… all those things that were not at rest on the Earth would seem to have a movement contrary to it, and never would a cloud be seen to move toward the east nor anything else that flew or was thrown into the air. For the Earth would always outstrip them in its eastward motion, so that all other bodies would seem to be left behind and to move toward the west.”  (Ptolomy in Almagest)

A statement of perfect logical consistency, but it is incorrect due to a number of mistaken hidden premises. However, it took about 1,600 years before somebody actually tested that statement.
If you think about it, a lot of what we do is try to create a model of a production process that is “close enough” to reality to be useful, which is all that applied science tries to do.

However, logic is a necessary (if not sufficient) requirement of science, so it is worthwhile to take a look at some common, and tricky, logical fallacies to avoid wasting time and money. Following is an insidious favorite.

### Cum hoc, ergo propter hoc—or correlation is not causation

The Latin translates to “with this, therefore because of this.” You probably heard this one in your first statistics class as “correlation is not causation,” and it is treated humorously in this comic strip in figure 1:

Figure 1: Randall Munroe’s webcomic at http://xkcd.com

Just because deaths due to drowning and ice-cream sales are strongly correlated does not mean that banning ice cream at the beach will prevent drowning. So even when we find a significant correlation or association in a statistical analysis, we can’t assume a causal link.

When we see a significant correlation, this could be due to:

• A true causal relationship between the two variables, in the direction we think. For example, ice-cream sales do cause swimming deaths.
• A true causal relationship exists, but in the opposite direction. For example, swimming deaths cause ice-cream sales. (This might be true in some horrifically voyeuristic society that buys snacks to better enjoy the spectacle of a drowning. Come to think of it, with the popularity of reality shows, maybe this isn’t as unlikely as I thought….)
• Another factor that the two hold in common is the real causal factor. When it is warm, more people go out swimming, and thus a higher number drown. When it is warm, more ice cream is sold. Thus, when it is warm, both drownings and ice cream sales go up, and when it is cold they both go down.
• Alpha error. The statistical test concluded a relationship exists where none really does due to chance and chance alone or, in the absence of a statistical process, just plain coincidence. But even though we all know this, it is still a seductive error to make.

While helping a client, we were looking at the recent historical production rate of a steel mill, stratified by operator, and saw something like this in figure 2:

Figure 2: Mill speed by operator

Properly running the analysis of variation (ANOVA), we found that these operators are, in fact, significantly different from each other in the speed at which they run the mill. The supervisor I was working with was all ready to post these results on the daily management board, with the stated intent to “create a little competition,” which would presumably ratchet up the mill production. He was also preparing to kick Operator 3’s booty, saying that he ought to know better, since he had been around for awhile. In fact, the supervisor was kind of surprised because Operator 3 was “the one everyone went to with questions on the mill. He probably is getting lazy and just needs a wake-up call.”

Although that hypothesis is consistent with the data, it is not the only hypothesis that is (keeping in mind that correlation is not causation), so I suggested that we go and have a non confrontational talk with the operator.

What we found was that when the other operators had a lot of material to run that was difficult, they set it aside for Operator 3 when he came in. Of all the operators, he could run the most difficult stuff the fastest. In this case, there was a true difference, but that difference was in the opposite direction: It was not that the mill was slower because of the operator, but that the operator was slower due to what he was given to run on the mill.

### Protecting against cum hoc, ergo propter hoc

You can avoid making this error. Never assume any type of relationship is causal until you have real proof that it is. The only way to do that is by designing a true experiment. Correlations of existing data during the early stages of problem solving provide one valuable input into our experimental factor-selection process, but are not sufficient alone to determine causality.

In our mill example above, we were just looking at existing data to get a picture of the process to identify factors for an experiment to increase production. If we had really intended to determine causality, we might have assigned the different operators similar things to run at various times and compared those speeds.

I also want you to notice that the choice of statistical technique employed (e.g., measures of correlation, measures of association, linear or nonlinear regression, or ANOVA) does not protect you from this error. Only you as the human brain involved in the process can do so.

Hopefully, you now feel better about the role of the human brain in the science of quality than you did after my last column.

## Six Sigma Heretic Article

These articles were originally published in Quality Digest, an online magazine. Subscribe to Quality Digest if you would like to receive these articles when they are published, or subscribe to our RSS feed.