## Neopythagorean Approaches to Measures of Central Tendency and Dispersion

In a recent article, Michael Kourkoulos and Constantinos Tzanakis reported a difficulty in explaining the use of the root mean squared deviation from the mean of a set of observations as a measure of dispersion to groups of prospective primary school teachers. These students seem to have had little difficulty in accepting the arithmetic mean and the median as measures of central tendency and the mean absolute deviation of the observed values from their arithmetic mean and the (semi-) interquartile range as measures of dispersion. However, while willing to admit the algebraic and computational convenience of the root mean squared deviation of the observed values from their arithmetic mean, they seem to have been reluctant to endorse this function for use in practical work.

The ultimate source of the students’ problem would seem to be their prompt—perhaps overprompt—acceptance of the arithmetic mean and the median as measures of central tendency. This was not the situation in the middle of the 18th century, when Thomas Simpson (1710–61) sought to explain to his contemporaries “the advantage of taking the mean of a number of observations” rather than a single observation as a measure of central tendency.

As a possible remedy for this situation, students should be encouraged to take a more-leisurely approach to the selection of one or more means to use in their statistical work. The approach recommended here involves a family of neopythagorean means that have been known since ancient times.

#### Neopythagorean Means

Given two positive numbers *x*_{1} = *x* and *x*_{2} = *z* with *x* < *z*, a problem studied by the early followers of Pythagoras was to find a third number *x̃* = *y* satisfying *x* ≤ *y* ≤ *z* and certain other properties suggested by the theory of proportions. To this end, we define the three simple ratios *p* = *z* / *x*, *q* = *z* / *y*, *r* = *y* / *x* and the four ratios of differences *s* = 1 / *t* = (*z* – *y*) / (*y* – *x*), *t* = 1 / *s* = (*y* – *x*) / (*z* – *y*), *u* = *t* + 1 = (*z* – *x*) / (*z* – *y*), *v* = *s* + 1 = (*z* – *x*) / (*y* – *x*), all of which take values ≥ 1, provided that *s* is employed when *y* ≤ (*x* + *z*) / 2 and *t* when *y* ≥ (*x* + *z*) / 2.

In this context, the triple {*x, y, z*} may be said to form a neopythagorean triple if any two of these seven ratios take the same value. Further, the middle-most element *y* of such a triple may be called a neopythagorean mean (even if it does not satisfy what we might regard as the essential properties of a statistical mean).

These seven functions can be set equal to unity, yielding several realizations of the lower bound *y* = *x* and the upper bound *y* = *z*, and a single useful result for *s* = *t* = 1. They also can be set equal to each other, yielding a total of 21 different equalities, of which only the 13 combinations need be listed with distinct solutions, as indicated in Figure 1.

Now, the 13 rows of this figure cover all but five of the 21 equations, because the first row relates to the combinations *s* = *t* = 1 and *u* = *v* = 2, while the second row relates to the combinations *s* = *q*, *s* = *r*, and *r* = *q*. The remaining five combinations were omitted for various reasons: The assignments *s* = *v* and *t* = *u* represent impossible combinations, the assignment *q* = *p* [like *u* = *r*] yields the lower bound *y* = *x*, and *r* = *p* and *v* = *q* yield the upper bound *y* = *z*.

The principal source, Nichomachus of Gerasa’s *Introduction to Arithmetic*, Book 2, Chapters 22–28, was only concerned with the first 10 combinations. He presumably omitted the means defined in the last two rows because they involve the golden ratio *Φ* = [√5 + 1] / 2 = 1.618 when he was only interested in expressions that could be solved in positive integers.

On the other hand, the mean defined in the 11th row was presumably omitted by mistake, since it may also be obtained by applying the seventh mean to the neopythagorean triple 1 / *z*, 1 / *y*, 1 / *x*. By contrast, the function *y* = [*z* ± (*z* – 2*x*)] / 2 defined by *u* = *r* in the 10th row should itself have been excluded from further consideration as it either takes the value *y* = *x*, which is the lower limit for *y* or the value *y* = *z* – *x*, which is not a mean but the range of possible values for *y*.

Finally, the constraints on the value of *y* implied by the use of *s* and *t* in rows 5, 6, 12, and 13 of Figure 1 mean that the corresponding values of *y* may not lie between the values of *x* and *z*. Of course, the expressions in rows 5 and 6 can still yield valid neopythagorean means for suitable choices of *x* and *z*, but this is not possible for the functions in the last two rows, since they feature the irrational number *Φ*.

#### Statistical Means

Four of the results tabulated here are well known: The assignment *s* = *t* defines the arithmetic mean *y*(*AM*) = (*x* + *z*) / 2, *s* = *r* defines the geometric mean *y*(*GM*) = √*xz*, *s* = *p* defines the harmonic mean *y*(*HM*) = 2*xz* / (*x* + *z*), while *t* = *p* defines the self-weighted arithmetic mean *y*(*SWAM*) = (*x*^{2} + *z*^{2})/(*x* + *z*).

These four ancient definitions of a mean are of considerable interest to statisticians, since they may readily be generalized to sets of *n* ≥ 2 observations *x*_{1}, *x*_{2}, …, *x _{n}* on

*X*:

(1) The arithmetic mean:

(2) The geometric mean:

(3) The harmonic mean:

(4) the (self-) weighted arithmetic mean with *i*th weight *x _{i}*:

It is not clear whether any of the functions defined in rows 5 to 13 of Figure 1 can be generalized similarly. However,

(5) the root mean square mean is formed as the geometric mean of the (unweighted) arithmetic mean (1) and the self-weighted arithmetic mean (4):

By construction, all five means are equivariant to changes of scale (when, for *i* = 1, 2, …, *n*, *x _{i}* is replaced by

*cx*for some

_{i}*c*≠ 0) and invariant to permutations of the order of the observations. However, with the sole exception of the arithmetic mean (1), the remaining four means are not equivariant to changes of location (when, for

*i*= 1, 2, …,

*n*,

*x*is replaced by

_{i}*x*+

_{i}*d*for some

*d*/ ≠ 0).

Moreover, restricting to the four expressions (1), (2), (3), and (5) provides a general formula for a class of means:

where *ƒ* (*x*) is an increasing function of *x* (assumed positive). Setting *ƒ* equal to the identity function *ƒ* (*x*) = *x* defines the arithmetic mean (1); setting *ƒ* equal to the logarithmic function *ƒ* (*x*) = *log*(*x*) defines the geometric mean (2); setting *ƒ* equal to the negated reciprocal function *ƒ* (*x*) = 1 / *x* defines the harmonic mean (3); and setting *ƒ* equal to the square function *ƒ*(*x*) = *x*^{2} defines the root mean square mean (5).

Clearly, in this context, the root mean square mean (5) is no longer a mystery, but a natural (if somewhat exotic) combination of three neopythagorean means that has been known since ancient times.

#### Measures of Dispersion

If, instead of applying these means to the observations *x _{i}* themselves, they are applied to the absolute values of the deviations of the observations from the chosen mean

*d*=

_{i}*x*–

_{i}*x̃*, then the arithmetic mean (1) leads naturally to the mean absolute deviation from the arithmetic mean:

while applying the square of the root mean square mean (5) to the same deviations from the arithmetic mean leads naturally to the estimated variance:

#### Median and Interquartile Range

To extend this scheme to include the median, assume that *n* = 2*k* – 1 is odd and that the *x _{i}* values have been arranged in increasing order

*x*

_{[1]}≤

*x*

_{[2]}≤ …

*x*

_{[n]}. In this context, setting

*ƒ*equal to the rank index function

*ƒ*(

*x*

_{[i]}) =

*i*identifies the median as the middlemost or

*k*= (

*n*+ 1) / 2th value of

*x*.

In much the same way as the arithmetic mean (1) leads naturally to the mean absolute deviation from the arithmetic mean as a measure of dispersion, the median leads naturally to the median absolute deviation from the median. Indeed, these last two expressions were the measures of central tendency and dispersion recommended by Francis Galton (1822–1911) for use in practice.

For an alternative measure of dispersion based on the median, suppose that *n* + 1 = 4*m* is divisible by 4 so (*n* + 1) / 2 = 2*m* and the median is again defined by *Q*_{2} = *x*_{[2m]}. Further, the lower quartile—the median of the 2*m* – 1 observations lying below the median—is defined by *Q*_{1} = *x*_{[m]} and, similarly, the upper quartile, being the median of the 2*m* – 1 observations lying above the median, is defined by *Q*_{3} = *x*_{[3m]}. This provides two estimates of deviations from the median, namely *Q*_{3} – *Q*_{2} and *Q*_{2} – *Q*_{1}, which we may sum as either the interquartile range *Q*_{3} – *Q*_{1} or average them as the semi-interquartile range (*Q*_{3} – *Q*_{1}) / 2.

#### Median and Midrange

For simplicity in that definition of the median, we have assumed that *n* is odd. If instead, *n* = 2*k* is even, then the median is usually defined as the average of the two middlemost values *x̃* (*Med*) = (*x*_{[k]} + *x*_{[k+1]}) / 2. This expression takes the same form *x̃* = (*x* + *z*) / 2 as the first of the neopythagorean means.

The same is true of the midrange *x̃* (*M R*) = (*x*_{[1]} + *x*_{[n]}) / 2 and the (*h* – 1)-level symmetrically trimmed midrange *x̃* (*ST M R*) = (*x*_{[h]} + *x*_{[n + 1 − h]}) / 2. Clearly, the median is an extreme example of the symmetrically trimmed midrange with *h* set equal to the integral part of (*n* + 1) / 2, so *h* = *k* whether *n* = 2*k* is even or *n* = 2*k* – 1 is odd. However, we shall not pause to compare the merits of the midrange and its symmetrically trimmed variants with those of the arithmetic mean because to do so would take us too far out of our way.

#### Mechanical and Physical Models

Of course, nothing so far obviates the need for teachers of elementary statistics to offer their students appropriate analogies in support of their chosen measures of central tendency and dispersion, such as the mechanical and physical models favored by Farebrother, Kourkoulos, and Tzanakis (see below). However, it should be noted that some of these analogies are capable of being developed as models for more-advanced statistical techniques, such as fitting a line or a plane to a set of observations in two or more dimensions, while others are not.

#### Two-Variable Orthogonal Least Squares

Finally, we offer a more-exotic application of the generalized means of Section 3. Suppose that, for *i* = 1, 2, …, *n*, we have observations *x _{i}* and

*y*on the variables

_{i}*X*and

*Y*. Then we may define

*x̄*= Σ

*x*/

_{i}*n*,

*y*= Σ

*y*/

_{i}*n*,

*S*= Σ(

_{xy}*x*–

_{i}*x̄*) (

*y*–

_{i}*ȳ*),

*S*= Σ(

_{xx}*x*–

_{i}*x̄*) and

*S*= Σ(

_{yy}*y*–

_{i}*ȳ*)

^{2}.

Suppose we are interested in estimating the slope *b* (relative to the *x*-axis) of the relationship (supposed linear) between *y* and *x*. This provides two ordinary least-square estimators of *b*, namely *a* = *S _{yy}* /

*S*and

_{yx}*c*=

*S*/

_{xy}*S*, which we may attempt to reconcile by substituting the mean

_{xx}*b*determined by applying the same increasing function to each side of these two expressions before summing terms.

In particular, applying the identity function *ƒ* (*p*) = *p* to both expressions, results in obtaining the arithmetic mean defined by the equation 2*b* = *a* + *c*. Similarly, applying the negated reciprocal function *g*(*q*) = -1 / *q* to both expressions, provides the harmonic mean defined by the equation 2/*b* = 1/*a* + 1/*c*.

However, in the present context, we prefer to employ a less-traditional choice of mean by applying *ƒ* to the first ratio and *g* to the second, thereby obtaining the mean defined by the equation *b* – 1 / *b* = *a* – 1 / *c* or

*b* – 1 / *b* – (*S _{yy}* –

*S*) /

_{xx}*S*

_{xy}which simultaneously defines both the slope of the orthogonal least squares line *b* = *k* relative to the *x*-axis and also the slope *b* = 1 / *k* of the line orthogonal to it. These are clearly also the slopes of the two principal components of the given data set.

Suppose instead that we are interested in estimating the slope *d* = 1 / *b* (relative to the *y*-axis) of the relationship (supposed linear) between *x* and *y*. Again, there are two ordinary least-square estimators of *d*, namely 1 / *c* = *S _{xx}* /

*S*and 1 /

_{xy}*a*=

*S*/

_{yx}*S*. Again, applying the function

_{yy}*ƒ*to the first ratio and

*g*to the second creates the equation

*d* – 1 / *d* = (*S _{xx}* –

*S*) /

_{yy}*S*

_{xy}which defines both the slope *d* = 1 / *k* of the orthogonal least-squares line (relative to the *y*-axis) and the slope *d* = -k of the line orthogonal to it.

We have given a heuristic derivation of the equations defining the two-variable orthogonal least-squares procedure developed independently by Julius Ludwig Weisbach (1806–71) in 1840 and by Robert Jackson Adcock (1826–95) in 1877–78. However, practitioners do not seem to have employed the method of orthogonal least squares in their applied work until after the later discovery of a more-general procedure by Karl Pearson (1857–1936) in 1901.

#### Further Reading

Boyer, C.B. 1968, 1991. *A History of Mathematics*. New York: Wiley. 56.

Farebrother, R.W. 2002. *Visualizing Statistical Models and Concepts*. New York, NY: Marcel Dekker.

Stoyan, D., and Morel, T. 2018. Julius Weisbach’s pioneering contribution to orthogonal linear regression (1840). *Historia Mathematica* 45: 75–84.

Kourkoulos, M., and Tzanakis, C. 2010. History, and students’ understanding of variance in statistics. *BSHM Bulletin: Journal of the British Society for the History of Mathematics* 25: 168–178.

Nichomachus of Gerasa. 1952. *Introduction to Arithmetic*. Chicago, IL: Britannica Great Books of the Western World.

#### About the Author

Richard William Farebrotherwas a member of the teaching staff of the Department of Econometrics and Social Statistics in the [Victoria] University of Manchester from 1970 until 1993, when he took early retirement on medical grounds. From 1993–2001, he was an honorary reader in econometrics in the Department of Economic Studies of the same university. He has published three books: Linear Least Squares Computations (1988), Fitting Linear Relationships: A History of the Calculus of Observations 1750–1900 (1999), and Visualizing Statistical Models and Concepts (2002), followed by a monograph on L1-Norm and L-Norm Estimation (2013). He has also published more than 150 research papers on a wide range of subject areas, including econometric theory, computer algorithms, statistical distributions, statistical inference, and the history of statistics. He was awarded his PhD in 1975 and DSc in 1992.

Very odd motivation to the topic of the paper …