Does this data make my shape look funny?
If you have been following my articles over the last few months, you have seen that even though statistical process control (SPC) charts are very powerful tools for examining a process, it turns out that there are a lot of ways to mess up SPC. This month, I am going to finish up with a few more things to watch out for as you use them, so you never have to ask, “Why doesn’t SPC work here?”
What shape are you in?
No, I am not interested in if you are working out. The basis for using control charts to help you make economical decisions comes from assumptions about the type and shape of the distribution you are dealing with. If a process is out of control, it is by definition not coming from a single distribution, so the distributional assumptions cannot be met. This does NOT mean that the control chart is useless—in fact, you use those distributional assumptions to help you identify what was unexpected so that you can spend time investigating those particular events that were unusual to the underlying process and to identify, and then eliminate, what caused them.
If you have a defect-count type chart (c chart or u chart), then to be in control, your data must be distributed as the Poisson would predict—otherwise the control limits defining when you react (and when you don’t) and any statements about the future defect rate you might make for the process will be bogus. (There is no requirement that defect data be distributed as the Poisson, so make sure to check.) Similarly if you have defective data (pass/fail) you must have a Bernoulli process (which must follow the binomial distribution) to be in control. If you are out of control, one or more of the requirements for a Bernoulli process have been violated and you need to investigate to see what happened.
The default limits and rules for all of the charts used for continuous data charts assume a normal distribution. Now for many processes, the normal distribution is not the “correct” or even desired output for a process. But what we are trying to use the control chart for is to tell us when we should spend the money to react and when to leave the process alone, and for that we can use the normal distribution on the averages.
For example, if I am filling cereal boxes, I want a distribution to get as many boxes as possible to meet the “Net Wt.” printed on the box without going under. The ideal shape for this is not going to be a normal distribution. Here are two histograms plotted on the same axis showing how I could achieve this with the same process standard deviation of one.
Figure 1: Filling Cereal Boxes with an Exponential Distribution, minimum = 12, mean = 13 (the standard deviation is mean – minimum = 1)
Figure 2: Filling Cereal Boxes with a Normal Distribution, mean = 15, standard deviation = 1
So if I am filling cereal boxes, a normal curve is going to cost me a lot of money since I am giving away a lot of free cereal. (By the way, I could make it worse by insisting that the cereal-filling process be "six sigma” in which case I would have to move the mean to 16.5. Yet another reason why the “sigma” index drives bad behavior.
For the x-bar type charts, we have the Central Limit Theorem working for us—as the sample size gets bigger, regardless of the shape of the underlying distribution, the shape of the distribution of the sample averages tends to become more normal, so the limits and rules for the x-bar charts become applicable if the sample size is large enough. How large is “large enough”? Well, for a slightly skewed distribution, n = 5 may be all right, for a more skewed distribution, n = 8 might be enough. The worst case is an exponential distribution—its sample averages should be normal if you take 25 samples.
Once the distribution of the averages of the samples are normally distributed, the control chart will correctly identify the economical points at which to react and not react to variation in the process. So you don’t need normally distributed data to use a control chart to identify and eliminate unusual sources of variation in the process, if you have an adequate sample size. We are still going to have to deal with that non-normality though—more on that in a second.
To monitor our cereal-filling process for control, I can take a sample of 25 consecutive boxes and calculate the limits the usual way based on the average standard deviation within the sample. The control chart of the means will look like this:
Figure 3: Mean chart—exponential distribution
(Note that the default calculations for the standard-deviation-chart limits assume normality of the individuals, so the default limits will be wrong on that chart. There is a way to handle this that is outside the scope of what I want to say today.) Using the usual control violation rules, I can use this chart of the means, with its limits based on the normal distribution, to detect unusual variation in the process and react to eliminate it.
So what happens with a process that is in control, doesn’t produce normally distributed data, and is tracked on an individuals and moving range chart? (I would choose an individuals chart if the measurements were expensive or there were no rational subgroups, for example.) We don’t have the Central Limit Theorem on our side, but do we even need to deal with it? Well, we know for sure that the default limits will have us spending money to react when we don’t need to, and will have us losing money, missing things that were real process shifts.
Remember our perfectly-in-control cereal-filling process? Here is what the same data looks like on an individuals chart:
Figure 4: X and Moving Range Chart—Exponential Distribution
All those red dots are times that I would be reacting to the process when I should just be leaving it alone. (Since we have all 1,000 data points, we expect to see more false signals than with the previous chart, but this is crazy.) Recall what W. Edwards Deming told us about tweaking a process that is in control? It counts for non-normally distributed processes, too. This would waste time, money, effort, and sanity, and tweaking will also increase the variability of the process.
In the bad old days of hand calculation, we would have a long slog of work ahead of us to get limits and rules that made sense, or we might have been tempted to just say, “Oh well, it’s close enough (and worth it to me not to have to do those calculations).” Thankfully, we live in the days where computers have chips, not tubes, and a portable computer means you can carry it in your pocket and not a flatbed truck. Nowadays, tiny computers do that type of calculation really, really fast, so we can get the right answer with no pain.
So within a few seconds and clicks, I generated this individuals chart:
Figure 5: X Chart—Exponential Distribution
We still see some red dots, which we know are false signals (alpha errors) due to the 1,000 data points, but we see a lot fewer than with the default limits based on a normal distribution. That translates into less money and frustration.
Remember when I said that even if we are using the x-bar charts, we still have to deal with the non-normality?
This is really important—you still must test the distribution of the individuals for shape if you are calculating the capability of the process to meet spec, regardless of which type of continuous data control chart you are using. If you don’t, and if you then calculate an estimate of the population standard deviation from, say, the average range, this number is not going to tell you about how much stuff is going to be in spec or the relative losses of the process. And wrongly predicting this stuff is what we call in the business, “bad.” (Interestingly, in this case, the loss per unit is about the same calculated assuming normality and summed using the actual data. More on this another time.)
So the exponential distribution saves us a lot of money, and treating it as such I get the correct prediction of how much will fall below specification.
I really am sorry—I wish I could say that you don’t have to worry about distribution shape in SPC, but as you can see you do in both SPC and in capability calculations.
What was that sound? Was that the whole argument about normality in SPC finally being dropped?
(Take a five minute break to laugh maniacally at this point while you wish it were so….)
Another way to make SPC useless is to not document and follow a reaction plan. This is a simple procedure that tells the people operating the equipment how they should react when the chart detects an out-of-control condition. As I have said before, SPC is a tool—but it changes nothing by itself. In the 1980s companies control charted everything that moved and some things that didn’t, papered the walls of conference rooms with control charts, and wondered why nothing changed. Nothing will change unless you have a common way to react to the signals the control chart is giving you.
A reaction plan consists of three elements. Once you have detected an out-of-control signal, you first need to correct the process, which is usually an adjustment to get it back into control, and create a record of the adjustment and putative source cause. Adjust by how much is sometimes a nontrivial discussion, but once you find out how, you had better have everyone do it the same way.
Next, you have to determine if you need to contain the product since the previous sample might be at risk of being out of specification. (Hint: With a process that is in control with a Cpk of 4, you probably do not need to contain, with a process with a Cpk of 1.1, you probably do.)
Finally, you need some time away from the process to prevent the special cause from happening again. You cannot do this while running a process. (This is the mistake made in the 1980s—well, that and the haircut of the lead singer of Flock of Seagulls, but that's another story.) Time away from the process allows you to prioritize the special cause for elimination and work to that end.
A few other mistakes show up from time to time, so just be aware of them.
- Choosing the wrong variable. If you are doing SPC on it, it should be important enough so that people react to it. If not, don’t waste the time and money.
- Not training process owners. I am a big fan of teaching operators about control theory and how it relates to their process. In fact, one of my first jobs was to do just that. In my experience, operators love this stuff, since it relates their hard-earned process knowledge back to the data that you are asking them to collect and use.
- Insufficiently defined data collection. If you collect “thickness” data but you don’t define where it is collected and how to measure it, you will probably be chasing out of control points that are really not there.
- Nonrandom data collection intervals. I can’t count the number of times I have seen an SPC chart where the data are collected every hour on the hour. If you collect one sample per hour, it needs to be randomly determined when within the hour that you collect, otherwise things that happen periodically will be missed.
- Not annotating charts when the process goes out of control. The best time to come up with an idea of why the process went out of control is right when it happens. Make it easy to record on the spot what the person thinks might have caused it.
- Not recalculating charts when necessary. I see this all the time and it confuses me. As you make changes to your process that reduce variability around the target, your control chart should be recalculated from the point of the change. This is a really easy way to show customers how much better you have gotten through the years.
- Recalculating limits all of the time. This is a stealthy one. Once you have a period where your process is in control, stop calculating the control limits and fix them where they are. Otherwise, long-term cycles can be hidden as you constantly recalculate the limits.
- Using SPC on a broken process. If everyone knows the process is not working as designed, or each operator runs it differently, or the equipment is broken, the first thing to do is fix the process. People rightfully get bored when repeatedly entering points that are out of control for reasons that everyone knows.
SPC is a powerful tool for diagnosing what is going on in your process. Its whole purpose in life is to signal you when an unexpected source of variability has infected your system, so that you can identify and eliminate that source. SPC also provides a heuristic for predicting the future performance of your process, which is why you must have statistical control before calculating your process’ capability indexes. SPC is not, however, immune from the errors I have listed throughout this four-part series of articles. It is also not a strategy.
The answer to “Why doesn’t SPC work?” is that, of course, it does. When it seems to not work is when the control chart is trying its hardest to scream at you that there is something going on in the process that is unexpected, or that your assumptions about the process are flawed, and that further investigation is needed. Developing your ability to listen to what SPC is telling you about your process, and appropriately reacting to that, is what makes the use of SPC so incredibly valuable to businesses and organizations.