Section 9.6 Normal Distribution as a Limiting Distribution
Over the past several chapters you should have noticed that many distributions have skewness and kurtosis formulae which have limiting values of 0 and 3 respectively. This means that each of those distributions which can be approximated by the normal distribution for "large" parameter values.
To see how this works, consider a "random" distribution in the following two interactive experiments. For the first graph below, a sequence of N random samples, each of size r, ranging from 0 to "Range" is generated and graphed as small data points. As the number of samples N and the sample size r increase, notice that the data seems to cover the entire range of possible values relatively uniformly. (For this scatter plot note that each row represents the data for one sample of size r. The larger the N, the greater the number of rows.) Each row is averaged and that mean value is plotted on the graph as a red circle. If you check the "Show_Mean" box, the mean of these circles is indicated by the green line in the middle of the plot.
For the second graph below, the means are collected and the relative frequency of each is plotted. As N increases, you should see that the results begin to show an interesting tendency. As you increase the data range, you may notice this graph has a larger number of data values. Smoothing groups this data into intervals of length two for perhaps a graph with less variability.
Consider each of the following:
- As M increases with single digit values of N, what appears to happen to the mean and range of the means? How does increasing the data range from 1-100 to 1-200 or 1-300 affect these results?
- As M increases (say, for a middle value of N), what appears to happen to the means? How does increasing the data range from 1-100 to 1-200 or 1-300 affect these results?
- As N increases (say, for a middle value of M), what appears to happen to the range of the averages? Does your conclusion actually depend upon the value of M? (Look at the graph and don’t worry about the actual numerical values.) How does increasing M for the second graph affect the skewness and kurtosis of that graph? Do things change significantly as N is increased?
So, even with random data, if you are to consider the arrangement of the collected means rather than the arrangement of the actual data then the means appear to have a bell-shaped distribution as well.