Deriving median household income

prepared for the Lewis Mumford Center by
Dr. Brian J. Stults
Center for Studies in Criminology and Law
University of Florida

Though our analysis is conducted for metropolitan areas, the method requires that we collect data for smaller geographic levels (e.g. census tract and FIPS place) and then aggregate up to the metropolitan level. This presents a complication when we are interested in a median characteristic such as the median household income of an area. Unlike other characteristics which can be directly obtained by adding up the counts for component geographic parts, a median requires special attention. The medians presented in our analysis were derived using one of two different methods depending on the structure of the geographic area.

Acquired directly from census records
It is possible to use the medians taken directly from census data when it is not necessary to aggregate multiple geographic areas into a single larger area. This method is used in our analysis of the full metropolitan area when the area does not cross state boundaries. It is used in our analysis of the central citiy portion of the metropolis when there is only one central city in the particular area. It is not possible to use the medians provided by the census for suburban portions of the metropolis.

Derived using linear or pareto interpolation
When it is necessary to aggregate central cities, suburbs, or portions of a metropolitan area from different states, we follow the methods of interpolation used by the Census Bureau based on grouped income data. If the median falls within a category that is not wider than $2,500, linear interpolation is used. Pareto interpolation is used for wider categories. Linear interpolation is described in most textbooks on statistical methods, so we will not discuss it further. However, because pareto interpolation is less well-covered, we provide a brief description of the method.

Pareto interpolation

The standard formula for calculating a median (shown in Figure 1 below) requires values for two parameters - k and theta. These parameters can be derived using the formula for the area under a pareto curve shown in Figures 2 and 3, where:

a   = income value at the lower limit of the category containing the median

b   = income value at the upper limit of the category containing the median

Pa = proportion of the distribution that lies below the lower limit

Pb = proportion of the distribution that lies below the upper limit

Through basic algebraic transformations, we use these known values to solve for k and theta as shown in Figures 4 and 5. The median can then be calculated by inserting the values of k and theta into the formula in Figure 1.

Figure 1: Median

Figure 2: Proportion below lower limit Figure 3: Proportion below upper limit

Figure 4: Solving for k

Figure 5: Solving for theta