Other Statistical Point Measures

Section 1.7 Other Statistical Point Measures

Above, we have investigated statistical measures that help determine the middle and the spread of a given data set. There are however other metrics available that help describe the distribution of that data. Skewness is one of those metrics and describes any lack of symmetry of the data set's distribution and whether data is stretched out to one side or the other.

Definition 1.7.1 Skewness

For population data, the Skewness of $x_1, x_2, ..., x_n$ is given by

$\begin{equation*} \frac{1}{\sigma^3} \frac{\sum_{k=1}^n ( x_k-\mu )^3}{n}. \end{equation*}$

For sample data, the Skewness of $x_1, x_2, ..., x_n$ is given by

$\begin{equation*} \frac{1}{s^3} \frac{\sum_{k=1}^n ( x_k-\overline{x} )^3}{n}. \end{equation*}$

A positive skewness indicates that the positive $(x_k - \mu)^3$ terms (likewise $(x_k - \overline{x})^3$ terms) overwhelm the negative terms. So, a positive skewness indicates that the data set is strung out to the right. Likewise, a negative skewness indicates a data set that is strung out to the left.

Data might tend to be clustered around the mean. The "kurtosis" can be used to measure how closely data resembles a "bell-shaped" collection.

Definition 1.7.2 Kurtosis

For population data, the Kurtosis of $x_1, x_2, ..., x_n$ is given by

$\begin{equation*} \frac{1}{\sigma^4} \frac{\sum_{k=1}^n ( x_k-\mu )^4}{n}. \end{equation*}$

For sample data, the Kurtosis of $x_1, x_2, ..., x_n$ is given by

$\begin{equation*} \frac{1}{s^4} \frac{\sum_{k=1}^n ( x_k-\overline{x} )^4}{n}. \end{equation*}$

A kurtosis of 3 indicates that the data is perfectly bell shaped (a "normal" distribution) whereas data further away from 3 indicates data that is less bell shaped.

Theorem 1.7.3 Alternate Formulas for Skewness and Kurtosis

Skewness =

$\begin{equation*} \text{skewness} = \frac{1}{s^3} \left [ \frac{\sum_{k=1}^n x_k^3}{n} - 3 v \overline{x} - \overline{x}^3 \right ] \end{equation*}$

and Kurtosis =

$\begin{equation*} \text{kurtosis} = \frac{1}{s^4} \left [ \frac{\sum_{k=1}^n x_k^4}{n} - 4 \overline{x} \frac{\sum_{k=1}^n x_k^3 }{n} + 6 \overline{x}^2 v - 3 \overline{x}^4 \right ] \end{equation*}$

Proof

For skewness, expand the cubic and break up the sum. Factoring out constants (such as $\overline{x}$) gives

\begin{align*} & \frac{\sum_{k=1}^n ( x_k-\overline{x} )^3}{n}\\ & = \frac{\sum_{k=1}^n x_k^3}{n} - 3 \overline{x} \frac{\sum_{k=1}^n x_k^2 }{n} + 3 \overline{x}^2 \frac{\sum_{k=1}^n x_k}{n} - \frac{\sum_{k=1}^n \overline{x}^3}{n}\\ & = \frac{\sum_{k=1}^n x_k^3}{n} - 3 \overline{x}(v + \overline{x}^2) + 3 \overline{x}^3 - \overline{x}^3\\ & = \frac{\sum_{k=1}^n x_k^3}{n} - 3 \overline{x}v - \overline{x}^3 \end{align*}

and divide by the cube of the standard deviation to finish. Note that the first expansion in the derivation above can be used quickly if the data is collected in a table and powers easily computed.

For kurtosis, similarly expand the quartic and break up the sum as before. Note that you can extract the value of the cubic term by solving for that term in the skewness formula above. Then,

\begin{align*} & \frac{\sum_{k=1}^n ( x_k-\overline{x} )^4}{n}\\ & = \frac{\sum_{k=1}^n x_k^4}{n} - 4 \overline{x} \frac{\sum_{k=1}^n x_k^3 }{n} + 6 \overline{x}^2 \frac{\sum_{k=1}^n x_k^2}{n} - 4 \overline{x}^3 \frac{\sum_{k=1}^n x_k}{n} + \frac{\sum_{k=1}^n \overline{x}^4}{n}\\ & = \frac{\sum_{k=1}^n x_k^4}{n} - 4 \overline{x} \frac{\sum_{k=1}^n x_k^3 }{n} + 6 \overline{x}^2 (v+\overline{x}^2) - 4 \overline{x}^4 + \overline{x}^4\\ & = \frac{\sum_{k=1}^n x_k^4}{n} - 4 \overline{x} \frac{\sum_{k=1}^n x_k^3 }{n} + 6 \overline{x}^2 v - 3 \overline{x}^4 \end{align*}

and then divide by the fourth power of the standard deviation. Note again that the first expansion in the derivation above might also be a useful shortcut.

Going back to a previous example...

Computing skewness and kurtosis by hand can often be better organized using a table. Below, notice that the $x_k$ column would be the given data values but the other columns you could again easily compute.

$x_k$	$x_k^2$	$x_k^3$	$x_k^4$
1	1	1	1
-1	1	-1	1
0	0	0	0
2	4	8	16
2	4	8	16
5	25	125	625

Table 1.7.4 Computing data statistics by hand

So,

$\Sigma x_k = 9$ and

$\Sigma x_k^2 = 35$ as before and so

$\overline{x} = \frac{3}{2}\text{,}$

$v = \frac{26}{6}\text{,}$

$s^2 = \frac{6}{5} \times v = \frac{26}{5}\text{,}$ and so

$s = \sqrt{\frac{26}{5}} = \sqrt{5.2}\text{.}$ But also,

$\Sigma x_k^3 = 141$ and

$\Sigma x_k^4 = 659\text{.}$ Use these in the formulas above to obtain skewness of

$\begin{equation*} \left [ \frac{141}{6} - 3 \cdot \frac{26}{5} \cdot \frac{3}{2} - \left ( \frac{3}{2} \right )^2 \right ] / s^3 \end{equation*}$

and kurtosis of

$\begin{equation*} \left [ \frac{659}{6} - 4 \cdot \frac{3}{2} \cdot \frac{141}{6} + 6 \left ( \frac{3}{2} \right )^2 \cdot \frac{26}{5} - 3 \cdot \left ( \frac{3}{2} \right )^4 \right ] / s^4. \end{equation*}$