**Statistical Thinking**

This philosophy relates to how people take in and process information (learning) as well as how they respond to it (action). It is based on the following:

Variation exists in all processes.

That is always true, a fact of life!

Dr. Walter A. Shewhart, of Bell Telephone Laboratories, developed a theory inthe 1920’s where he identified two components of variation; a steady component and an intermittent component. The first, called common or random variation, is caused by chance or undiscovered causes. It occurs randomly. Intermittent variation, the second type, results from assignable causes (causes we can discover).

Statistics: A philosophy of learning and action. The term philosophy separates statistical thinking (a set of thought processes) from number crunching (the use of particular statistical tools). This philosophy relates to how people take in and process information (learning) as well as how they respond to it (action) and is based on the following:

All variation is the result of two separate causal systems:

A steady component and an intermittent component. The first, called common variation, is caused by chance. Intermittent variation results from assignable causes, those we can identify. We cannot eliminate common cause variation without changing the system. Attempts to do so will always increase the common variation.

So, where (how) can we use this reality? Volatility is with us! Using statistics gives us control, providing the means to deal with volatility, to improveand control industrial processes. A savvy tech, looks at volatility and sees opportunity.

**A. Why Statistics?**

In every walk of life we will find variation. How do we deal with this variation between things rationally? As young technicians, we were taught that when we needed to know how big something was, we just went out and measured it. Then we knew! But we then learned if we made the same measurement later, the measured value was never exactly the same. Then we weren’t so sure. But the fact is, no matter how hard we try to hold all variables constant, and exercise the greatest caution to repeat all measurements in precisely the same way, there will nonetheless be variability in the measurements. If we don’t find a difference, we just aren’t measuring close enough. That is reality.

In the face of variation, and the resulting uncertainty, we must develop ways to deal with data. For instance, what do we use for a product dimension in our literature when the results are different for two units from the same production lot? How do we define the “average,” of measured data? Further, in one case the variation may be small, while in another case quite extreme. How do we quantify the level of variation?

Lord Kelvin said it best over a hundred years ago, “When you can measure what you are speaking about, and can express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is a of meager and unsatisfactory kind.” Unless we can measure what we are dealing with we are poor technologists. Quantifying data is the basis of descriptive statistics, describing how data is to be viewed, and in deriving information from that data.

Consider with me for a moment what mental images are invoked by just hearing or reading different words. What comes to your mind as you read the words;

**MELLOW SATIN AUTUMN CRÈME**

Don’t they just make you feel kind of gooooood?

How about:

**EXAMINATION CROAK STATISTICS**

“Statistics” has never been a popular word. It just does not conjure up warm fuzzies in most people’s minds. There are a few good reasons for such a negative image. Taxation and military service were the earliest uses of statistics and still let governments know just how far they can go into our pocket books. Statistics have also been abused in advertising causing us to distrust much of what we see. When an ad suggests “Nine out of ten….,” they really want us to believe that “Nine out of every ten…”

Our approach to the science of statistics will be very positive. We will start with the definition:

Statistics: A set of rules that allow us to get

and then to present information from data.

We find ourselves overwhelmed with numbers (data). But data is not information. It has been said that data lives forever, but information has a very short half-life. We must efficiently and promptly turn data into information so decisions can be made. Since variation is always present when we observe things in the real world, we MUST practice statistics. The question is not whether we should, but how. Ignorance of statistical techniques in today’s competitive market place can be catastrophic!

**B. Descriptive Statistics**

Descriptive statistics describe the characteristics of a POPULATION with numerical measurements called PARAMETERS. There is no uncertainty here, where ALL elements in a population are considered. The first task is to define the population. Although descriptive statistics are basic, they can create difficulties.

For example, if the population is to be the AVERAGE FAMILY, how do we define FAMILY, or how would we define AVERAGE? Further, even if the population is clearly defined, the gathering of data can add more difficulties. For example, ILLEGAL ALIEN is a well defined population, but a tough place to gather information. The gathering of data has caused many projects to go down the tube. Yet, the importance and value of data gathering and correct statistical analysis cannot be overemphasized. We all have bad memories, therefore we need data! We forget things, especially that which is bad. We can’t say we have experience, that we know how things are going to behave… we forget!

Information contained in descriptive statistical data is easiest to see and understand when it can be shown in a picture, graphically. A simple graph will show the important characteristics:

CENTRAL TENDENCY (or location,

measured as the average, or mean)

DISPERSION (spread, or how much variability)

SHAPE (how the dispersion in the data is located)

The importance of “shape” can be shown with an example from the Denver housing market. Several years ago the demographics (statistics of the area) found the average family size to be 1.8 children. Three bedroom homes were built accordingly, but failed to sell as planned. Millions of dollars were tied up in unoccupied houses. Studies found the population was really made up of two distributions, a large segment of young working couples who didn’t have or want children and another segment that had large families, four or more children. The average was right, the variability not too bad, but the shape had been neglected.

There are other cases where even one number can help give us a picture of the population. Do you know what a Sidney Duck is? Might you have a better idea about that population if you knew the average weight is 165 pounds? Data may contain information just crying to be let out, but we must learn how to listen and how to ask the right questions. We must gain both an appreciation for and abilities in “listening skills.”

**C. Frequency Distributions**

In the science of statistics a picture is worth a thousand words (or numbers). The picture most often used to look at how data is distributed is called a Frequency Histogram or Frequency Distribution. Data is arranged according to size with the picture spotlighting where most of the data are grouped and the pattern of variation. Had a Histogram been used in the Denver housing example mentioned earlier, those millions of dollars could have been saved. The histogram is the best way to measure the third characteristic in descriptive statistics (the shape of the distribution).

Place a large spinner with a pointer on a board and let the random variable X be the position of the pointer as shown below. If we place small marks with the numbers 1 through 10 evenly spaced around the board, what is the chance (probability) that the pointer will stop exactly on, say the number 5? ….ZERO! The pointer can get terribly close to the 5 (or any number, 1 through 10), but to stop exactly on the point 5 will be extremely unlikely.

The object lesson here is that you can only assign probability to RANGES. If we divide that same circle into ten equal segments as shown in the second circle, with the number 5 representing one-tenth of the total circle, the probability of the pointer landing on the number 5 is now 1 spin out of 10 = 1/10 = 0.1.

The frequency distribution (probability density) for the circle would look like this, with the height of each bar equal to 0.1. Notice the sum of all bars, or the area of the total figure is ONE! 100% of all outcomes – That means the Area under the diagram must include all trials – There is no possible outcome that is not included in the figure.

Consider the spin of the pointer in Fig. 9.1 above. The likelihood in 9.1 (a) of the pointer stopping exactly on 5.0 is next to Zero. In Fig. 9(b), each segment is equal to 1/10 of the total picture. Here, the chance of landing on any segment is equal to 1/10, or 0.1. The histogram Fig. 9.1 (c) sums up the likelihood of all events nicely.

Dr. Shewhart’s definition of normal looks like this. In terms of Sigma, a 6 sigma (+/- 3 sigma) process contains all but 0.3% of the population.

Fig. 9.2 Normal Distribution