In every walk of life we will find variability. How do we deal with this variation between things in a rational manner? As young technicians, we were taught that when we needed to know how big something was, we just went out and measured it. Then we knew! But we then learned if we made the same measurement later, the measured value was never exactly the same. Then we weren’t so sure.
But the fact is, no matter how hard we try to hold all variables constant, and exercise the greatest caution to repeat all measurements in precisely the same way, there will nonetheless be some variability in the measurements. If we don’t find a difference, we just aren’t measuring close enough. That is reality.
In the face of variation, and the resulting uncertainty, we must develop some ways to deal with data. For instance, what do we use for a product dimension in our literature when the results are different for two units from the same production lot? How do we define the “average,” of measured data? Further, in one case the variation may be small, while in another case quite extreme. How do we quantify the level of variation?
Lord Kelvin said it best over a hundred years ago, “When you can measure what you are speaking about, and can express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” Unless we can measure what we are dealing with we are poor technologists. Quantifying data is the basis of descriptive statistics, describing how data is to be viewed, and in deriving information from that data.
Consider with me for a moment what mental images are invoked by just hearing or reading different words. What comes to your mind as you read the words;
MELLOW SATIN AUTUMN CRÈME
Don’t they just make you feel kind of gooooood?
EXAMINATION CROAK STATISTICS
“Statistics” has never been popular. It just doesn’t conjure up warm fuzzies in our minds. There are good reasons for such a negative image. Taxation and military service were the earliest uses of statistics and still let governments know just how far they can go into our pocket books. Statistics have also been abused in advertising causing us to distrust much of what we see. When an ad suggests “Nine out of ten….,” they really want us to believe that “Nine out of every ten…”
Our approach to the science of statistics is very positive. We will start with the definition:
A set of rules that allow us to get and
then to present information from data.
We find ourselves overwhelmed with numbers (data). But data is not information. It has been said that data lives forever, but information has a very short half-life. We must efficiently and promptly turn data into information so decisions can be made. Since variation is always present when we observe things in the real world, we MUST practice statistics. The question is not whether we should, but how. Ignorance of statistical techniques in today’s competitive market place can be catastrophic!
A. Descriptive Statistics
Descriptive statistics describe the characteristics of a POPULATION with numerical measurements called PARAMETERS. There is no uncertainty here, where ALL elements in the population are considered. The first task is to define the population. Although descriptive statistics are basic, they can create difficulties. For example, if the population is to be the AVERAGE FAMILY, how do we define FAMILY, or how would we define AVERAGE? Further, even if the population is clearly defined, the gathering of data can add more difficulties. For example, ILLEGAL ALIEN is a well defined population, but a tough place to gather information. The gathering of data has caused many projects to go down the tube. Yet, the importance of data gathering and correct statistical analysis cannot be overemphasized. We all have bad memories, therefore we need data! We forget things, especially that which is bad. We can’t say we have experience, that we know how things are going to behave… we forget!
Information contained in descriptive statistical data is easiest to see and understand when it can be shown in a picture, graphically. A simple graph will show the important characteristics:
CENTRAL TENDENCY (or location, measured is the average, or mean)
DISPERSION (spread, or how much variability)
SHAPE (how the dispersion in the data is located)
The importance of “shape” can be shown with an example from the Denver housing market. Several years ago the demographics (statistics of the area) found the average family size to be 1.8 children. Three bedroom homes were built accordingly, but failed to sell as planned. Millions of dollars were tied up in unoccupied houses. Studies found the population was really made up of two distributions, a large segment of young working couples who didn’t have or want children and another segment that had large families, four or more children. The average was right, the variability not too bad, but the shape had been neglected.
There are other cases where even one number can help give us a picture of a population. Do you know what a Sidney Duck is? Might you have a better idea about that population if you knew the average weight is 165 pounds? Data may contain information just crying to be let out, but we must learn how to listen and how to ask the right questions. We must gain both an appreciation for and abilities in “listening skills.”
B. Frequency Distributions
In the science of statistics a picture is worth a thousand words (or numbers). The picture most often used to look at how data is distributed is called a Frequency Histogram or Frequency Distribution. Data is arranged according to size with the picture spotlighting where most of the data are grouped and the pattern of variation. Had a Histogram been used in the Denver housing example mentioned earlier, those millions of dollars could have been saved. The histogram is the best way to measure the third characteristic in descriptive statistics (the shape of the distribution).
Place a large spinner with a pointer on a board and let the random variable X be the position of the pointer as shown in Fig. 35.1 (a) below. If we place small marks with the numbers 1 through 8 evenly spaced around the board, what is the chance (probability) that the pointer will stop exactly on, say the number 5? ….ZERO! The pointer can get terribly close to the 5, but to stop precisely on the point 5 will be extremely unlikely…
The object lesson here is that you can only assign probability to RANGES. If we divide that same circle into eight equal segments as shown in Fig. 35.1 (b), with the number 5 representing one-eighth of the total circle, the probability of the pointer landing on the number 5 is now 1 spin out of 8 = 1/8 = 0.125. Notice the sum of all segments, or the area of the total figure is ONE! (or unity) – 100% of all outcomes – that means the outcomes under the diagram must include all trials. All possible outcomes are included in Fig. 35.1 (b).