Measurement System Analysis - Potential Study with Steve Ouellette


Hi there! I'm Steve Ouellette, President of The ROI Alliance, and I thought I'd spend some time with you showing you one of the new modules that we added on to our free statistical analysis program called ROIstat.

This has to do with measurement system analysis, and I'd like to go through an example with you to show you some of the things that you can learn from using this module and from doing a measurement system analysis, or an "MSA".

So an MSA tries to answer part of the question of "How do we trust the numbers that we get from a measuring device?" Now any measuring device needs to show that it has control precision and accuracy. So control is stability through time, precision is how much variability is there in measuring the same thing over and over again, and accuracy is how close to the true number am I actually getting.

So something like a calibration sticker isn't sufficient for that, right? That says that at some point five months in the past, this device could measure this gauge block, which probably isn't even like the thing that we're making. So various measurement system analyses investigate these three components.

Now the first one we're going to go through is called a potential study. It's also called a "gauge R&R" commonly, and it's going to test precision in particular, and it's going to be able to test accuracy if you happen to know the true value of the things that you're actually measuring.

Now, so the purpose of this is something very fast, very quick and dirty to find out if a measurement system is even going to be useful for you. So let's spin the following scenario: let's say you're considering buying a new hardness machine and you're measuring the hardness of heat-treated aluminum bars. So somebody's come in with this new measurement device and you want to quickly determine if it's even going to be useful for you or not.

So your specifications for this process is a lower spec limit of 65.5, a target of 70, and an upper spec limit of 74.5. We've got two appraisers who are ready to do this test: Jack and Jill. And what you do is you get ten parts of your heat-treated aluminum bars from normal production, and you're going to measure those two times each for each of the appraisers.

Okay? All right, so let's take a look at the data. So you've completed your test. You've got Jack, who went through and did the parts in random order: first time, second time. Jill went through did the same thing. Here's the measurements. We do happen to have the true values here, although you don't need to to do a potential study.

And so we're going to do is we're going to copy that data. And we're going to go ahead and paste that into our app. Now you can load it as an Excel file or text file or whatever, that's all pretty easy to do. This is just a little faster.

So I'm going to import the data and I'm going to go to the MSA. Make sure I've entered everything correctly, so the appraisers are by name parts by part trial by trial measurements by measure. We're going to be using the range calculation because we're only doing two measurements per part per person and if I only have two then I need to use the range. I can use the standard deviation if I get up over eight.

All right, I'm going to go ahead and enter my specs. So my lower spec is 65.5, my upper spec is 74.5. Target is not needed for these calculations.

Now I've got some historical data, and historical data tell me that my process averages at a 69 and has a standard deviation without measurement error of one. Now if you look at the data that's in your database, that's your as-measured data, so that not only includes your production variability but any variability associated with whatever measurement system you're using. So you can estimate what the core process without the measurement system error is just by taking the variance of your as-measured data and then the variance of whatever the measurement system is that you're using to generate that data, and then you subtract the measurement system variance from the observed as-measured product variance and that leaves you a number. You take the square root of that number and that's going to be a standard deviation. That's a good way to estimate what your actual process is doing. So we've done that in this case and that is one.

So if you think about our process, our process has an average of 69 a standard deviation of one an upper and lower spec of 65.5 and 74.5. This process is actually running pretty well.

If you calculate your Cp, which compares the width of the process to the width of your spec, it's about 1.5, which means that your spec is actually one and a half times the width of your process. Which is really good news, because you'd have to have a pretty big shift in the process to even run a chance of going outside of spec.

Cpk accounts for how much of your process actually goes outside of spec. A Cpk of one means that you're right on the edge of the natural tolerance of your process. Our Cpk is 1.167 which means that we're capable of meeting the spec, although it'd be better if we could get it centered more.

Cpm further penalizes you for being off-target because presumably, your customers want what the target is. And so our Cpm is 1.061. So clearly this is a process it's running pretty well if I could just get it centered better that'd be great.

So already I know the chance of making anything of the product itself that's going outside of spec is pretty low. So I don't really estimate I'm going to get a lot of material outside of spec that's actually that way, okay? But of course, my measurement system has a variability to do that, and that may be a problem.

So let's take a look at the numbers that we are using, and so because we're using the range technique I'm going to be using the range within the appraiser and the range between the appraisers in order to estimate the standard deviation associated with my measurement system. So I'm measuring these same 10 parts two times for each person. And based on that, I've got a variability within part within appraiser that I'm going to be using to generate an average range within each of my appraisers, and then that's going to be generating an average range across both my appraisers.

And so the idea is if I measure the same thing within one person, I'm doing it pretty much the same way as I'm supposed to be doing it over and over again. Whatever that variability is might be kind of built into the gauge itself. Now, it's a little bit larger than that because clearly, I'm the one using the gauge, and maybe I'm using it a little differently than someone else maybe there's some interaction with certain parts, but generally, it's considered to be the EV the %EV - the equipment variation is mainly the variability built right into the measurement system itself.

Okay so in this case we see that it's 1.149 is our standard deviation due to kind of our within-person variability or within-appraiser variability. The problem is if I compare that to my spec width, and multiply that by 5.15 in this case - my natural tolerance of my gauge, I'm already using 65 - 66% of my gauge up just due to variability built into the hardness measuring device itself. That's a big chunk that might be a problem, because if I start using up more and more of my gauge with measurement error then I'm not left a whole lot to make a good decision with. If I make something that's in spec but towards one side or the other, I could easily call it out of spec or vice versa.

I look at the reproducibility. Reproducibility is the component of variance that has to do with the difference from one operator to another. Now I've done it my way, Jill's done it her way, and there's a difference in the average. That difference in the average contributes to the overall variability that I might see in my database with my as-measured data, right? And so that reproducibility component, the standard deviation for that is 2.7.

Here's another problem because if I compare that 2.7, multiply it by 5.15 to get the natural tolerance of the gauge, that takes 154 - 155% of my spec. It's actually larger than the spec itself.

And if I then combine those two estimates of the standard deviation into an overall estimate of the standard deviation, then I'm going to get a 2.9387 that's compared to the spec 168% of my spec. That means if I have a part that's smack dab in the center of spec, and it's really there - that's the real value - the possibility 95 of the time is that it could be 168 percent of my spec. Which means I could be calling it out of spec even though it's perfectly okay.

All right, so I already know I've got a problem with this particular gauge. It's got a lot of variability associated with it, and that's going to make it hard to determine even if it's in spec, much less control the process itself.

Okay, well let's see if we can get some clues as to what's going on as we look at some of the other output. All right, so this first summary graph has to do with the dispersion within part, and so looking at the variability, in this case, the range within part by appraiser, and putting that on just a normal range control chart. Now this isn't in time order, so the only thing I'm looking for here on this control chart is if a point goes outside of my control limits. Now you notice that Jack's control limits are significantly larger than Jill's control limits: these are on the same scale. So maybe Jack has a within his own measurements variability problem, but there's no single part that falls outside those limits. So these are provisional - I don't have 25 points yet, but I also don't see anything that says that the problems I'm seeing are due to a single part that's a problem for one or both of my appraisers. Whatever variability they have seems to be pretty equal across all the different parts that they're measuring.

So I'll take a look at my next one. Here's where it kind of gets interesting. So looking at uniformity of dispersion, I also want to know if what I'm measuring - the magnitude of what I'm measuring - as it increases or decreases, in this case, gets harder or softer, does the variability associated with that measurement change? If it does, I need to use that in my calculation to understand my %R&R, my variability compared to my spec, because if towards one end I have more variability and then towards the other end I have lower variability, then that's going to affect my ability to meet spec if the spec is the same for each of those two different hardnesses, right? If i have more variability, if it gets harder then my measurement error increases as I measure harder and harder materials, I'm going to have to build that into my understanding of how easy it is to determine if something is within spec or not.

Okay, so Jack doesn't show up to be significant but Jill does, and you'll notice that Jill has measurements that are way out here on average. So her average measurements are pretty high compared to Jack's. Now we're going to see evidence of that later on too, but I already know that I've got a problem here with my uniformity of dispersion. My dispersion apparently does change based on magnitude, and further, looking at the graph I can see that the numbers that Jill is getting are quite different than the numbers Jack's getting.

All right, so now if this were the real world, at this point I would stop and I would begin my investigation and try and figure out what's going on. But let me show you the rest of the charts because they'll give us more clues as to what's going on as well.

Okay, this chart here shows the average of my parts, that's the black line, and it shows this kind of dashed line here which is my specification limit - my upper and lower specification limit, and then this dotted line here is my measurement error. So again, remember my measurement error, the variability due to my within appraiser and between appraiser components of variance, is larger than the spec itself. And you can see how that almost got us here. So this guy, let's say his real value was here. Well I could measure anything from the top of that line to the bottom of that line shifted around that and still be within the random expected variability that I would expect from this particular measurement system. Which means it could be out of spec simply due to measurement error itself. If I had one that was right on average, let's say this guy right here is right on 70, right on my target, the measurement error alone could have called it as high as that or as low as that which is outside my spec. So even if I'm making a part that's really smack dab in the center, I'm running a chance of calling it out. We're going to come back to that in a second. This chart also shows where the individual measurements are as well

Okay, here's another way to look at kind of the same data, which is a box plot. And you can kind of get a visual idea of how much variability, how much difference there might be measuring the same part from one appraiser to the other. And again, we can kind of see that Jack has a little bit more variability than Jill does, which was kind of reflected in the range chart that we saw above.

So this chart here looks at the components of variance that go into the overall measurement error. So we've got the repeatability variability, we've got the reproducibility variability, and what this chart shows is that the vast majority is due to the reproducibility, the difference from one appraiser to another appraiser, which we saw back up at the very top as well. But this kind of gives us a graphic piece to hang our hat on. And so the idea might be that I might be able to eliminate about 84% of my variability, 85% my variability of my variance by getting my two different appraisers just to agree with each other.

Okay, so that gives me a hint is what I might need to work on. Now as you try and figure out if a gauge is usable or not the calculations we've done so far are statistical calculations, right? They're a determination of the overall variability and how that variability compares to spec - simple calculations. But there's a business decision here too which is, "Is this actually usable, is it sufficient for my purposes?" Partly that relies on how expensive it is, how expensive it is to measure. In this particular case, if it's in control over time I may be able to take multiple measurements and reduce that variability, base my decision on an average as opposed to a single measurement.

But one thing that I might want to know, and this is a novel way of showing that, is "What is my chance of misclassifying something?" So remember we entered in the actual process average and the process variability without measurement error, so that's represented by the center curve here. So that's what my process actually produces, and again it's all within spec, right? It's kind of mushed over to one side, kind of close to my lower spec. If I could get that centered it'd be better, but I'm really not making very much product that's actually outside of spec. Whereas around each of these specifications, i have a gradient and this gradient around the lower spec limit shows the width of the variability that I would expect due to my measurement error, which is much larger than my spec itself. And the same thing over here: this is the danger zone associated with my upper spec limit, and you notice they actually overlap. That means that there's no place that I can hide - there's no place I can run my process without a noticeable probability of misclassifying it as outside of spec. So virtually everything is going to be made inside of spec, but about 16.63% of the time, I'm going to actually be calling it out of spec and scrapping stuff.

So this is the consequence of having a highly variable gauge. In this case, it's even it's got more variability than the width of my spec itself in this case about 16-17% of the time I throw away perfectly good stuff.

All right so we don't like that, right? So we want to figure out what's going on. We've had one clue already, which was this chart right here which showed that these numbers are quite a bit different than Jill's numbers are quite a bit different than Jack's numbers.

So when you're doing this type of investigation there's additional diagnostics that you can take a look at by clicking here. Okay, so we see a recapitulation of what we've already talked about up here, just to remind us what's going on. And then we see certain statistical tests that are testing various things to see what might be going on. Now there's a lot here, I understand, but it's testing everything so that you can then come back and do some diagnosis if, as in this case, we don't have a gauge that we think is going to be very usable.

So the first one is looking at repeatability within appraiser, and so it's actually seeing if Jack agrees with Jack - his first trial versus his second trial. Now you would like to see that there's a high correlation in the numbers that came up the first time and the second time because they're the same parts. And we do in fact see a high correlation, so that's good. This is, is the variance difference from his first to his second trial, and they should be the same and that's what we see that they are the same, we don't reject there. Here's an interesting one though. This is the average difference, right? So if I look at all the ones he did the first time and all the ones he did the second time on average, if I average them all together they differ by 1.65 - they've actually gone up by 1.65. And so I test that and that's significant. And so that's a really big second clue: as jack within his own measurements, he ran through in a random order, ran through a random order again, and the second time, it was significantly harder than the first time.

All right, what's interesting is we see the exact same pattern with Jill, and so again there's high agreement within Jill from her first to second measurements, and that's what we expect we see. The variability here is actually starting to get significantly different - it's getting worse, and we see that on average she's gone up as well and that's significant.

Now at this point, we've got three pieces of information telling us what's going on. We saw that Jill's measurements tend to be higher than Jack's measurements, here we see that even within Jack we had an increase in measurement from the first to the second round and from Jill from the first the second round as well, and that's enough information for us to go back in and figure out what's going on.

And in this case, it turns out that aluminum actually hardens over time and so what was happening was the measurements that we were actually trying to measure, the hardness of the material, was increasing through time. And because we didn't account for that, because we didn't build that into either our model, or we maybe let it age after a little while instead so it kind of settles down, we actually caused a problem for the gauge itself. Because the measurement might have been in control except for the fact that this hardness is changing through time.

So you can see where we've used the information that we generated from our measurement system analysis to find out that A) this gauge by itself isn't going to work but B) the thing to start working on is understanding how the aluminum changes through time, maybe building a model to subtract that out and bring things together we also saw some differences in dispersion between Jack and Jill. So there's some things that we've got that we can work on going forward to see if we can actually end up using this new machine in our production process.

Okay well, I hope that that's introduced you to the power that's associated with doing measurement system analyses and how easy it is to do in ROIstat. Every device that you use to make decisions should have some sort of a measurement system analysis done on it, ideally, if it's important you'll be doing a long-term one. And I'll show you one of those at another time.

Okay thank you very much! I hope you enjoyed. Take care!



2025 Red Cloud Road
Longmont, CO 80504

Talk to us