Overview

Population Demographics data is created by applying a cohort model to historical, published demographics data in order to project that data forward. A cohort refers to a particular segment of the population, such as 29-year-old Asian females. The cohort model works by aging a population forward a year, applying birth, death, and migration rates, and repeating that process for each cohort in each projected year.Therefore, the steps in the process of creating demographics data are to break out existing data to the desired geographical and cohort granularity; construct the birth, death, and migration rates to be used by the cohort model; apply the cohort model to the data; and finally, adjust the projected data to published rates of growth.

Disaggregate existing population counts to tract and single ages: there are several different population demographics datasets available at varying geography and age group levels. We combine these to create a tract-level dataset for all historical years.
Estimate births, deaths, migration: We estimate the birth, death, and migration rates that will need to be applied to the above historical dataset.
Apply cohort model to data: The cohort model creates demographics data for projection years by aging the population forward one year and applying birth, death, and migration rates to the population. This process is repeated for each year for which projections are created.
Adjust to Census projections: After the cohort model is run, projected years of data are compared to national estimates from the Census Bureau and adjusted to match the Census Bureau's published growth rate for the population.

Process

1. Disaggregate Data

The output of the demographics process is historical and projected population counts by tract, race, ethnicity, gender, and single-age. For historical Census data, we have annual single-age demographic data at the state level, tract data available in age-groups, and annual county-level data available in age-groups. We use tract and age breakout processes to get historical population data from the Census down to the tract and single-age level.

Area Breakout to Tract

For the area breakouts, we use the 5-year tract data from ACS. First we normalize the tract data---that is, we take the population for each tract and calculate the percentage of the county population that it represents. For example, suppose county A has a population of 100 and three tracts: A1, A2 and A3 with populations of 10, 30, and 60, respectively. The normalization process calculates the percentage population for each tract: 0.1 for A1, 0.3 for A2, and 0.6 for A3. These values are multiplied by the annual county-level populations to determine each tract population. This is a slightly simplified view. In reality, we do this breakout for each race/ethnicity, gender, and age group.It is worth noting that ACS data, although available at the tract level, is only published in 5-year estimates. We elect to use the 5-year tract data to mode the more current Census Bureau data down to the tract level, creating a more recent dataset.
Age Breakout to Single Age

Age breakout is almost identical to the area breakout process. We take the state-level single-age data and calculate the percentage of single ages that make up each state. As with area breakout, this is done for each race, ethnicity, and gender. We multiply these ratios by the county- or tract-level population data to break out single ages at the county- or tract-level.

The end result of this step is tract-level demographics data by single-age, race, ethnicity, and gender for all historical years.

2. Create Cohort Model Components

Migration Estimation
- The basic idea behind the cohort model is to project future populations by aging each cohort forward one year and then accounting for births and deaths. However, people do not always remain in the same tract, county, state, or country from year to year. Migration may be especially pronounced in some cases, such as a college town where a large number of students migrate out of the town after they graduate.
  
  The migration estimation is created by aging the population forward a year, calculating deaths (using CDC data), and re-calculating population. This new population is then compared to published population data for the year from the Census. If the projected population is lower than the published estimate, this indicates out-migration for the region for the year. If the projected population is higher than the published estimate, this indicates in-migration for the region for the year.
  
  This process is repeated for each consecutive pair of historical years for which published data exists. Migration estimates are created for all historical years, and these will be used in the cohort model to estimate migration for future years. To calculate future migration rates, we work through all possible county, race/ethnicity, gender, and single-year age combinations and project future migration rates using linear regression.
Birth/Death Estimation
- This process produces projected birth and death rates for every demographic cohort and every age at the county level.First, for each year of historical data, convert the birth/death count from CDC to a rate by dividing the count for a particular demographic cohort by the census population of that particular cohort. Second, given several years of historical rates, project the next year's rate.
  
  We loop through all possible nodes (i.e., every county, race, gender, single-age combination), calculate the rate, and get the projection. For births, we limit our search to nodes of females age 12 - 50 years (i.e., capable of giving birth). The rate is technically a fertility rate, since we base it on the population of potential mothers instead of the entire population.
  
  After calculating rates for all historical years, we then project the next year's rate using linear regression. If the projected rate goes below 0, we set the rate to half the most recent year's rate (i.e., asymptotically approach zero). Similarly, if the projected rate goes above 1 we set the rate to the mid-point between the most recent year's rate and 1 (i.e., asymptotically approach one).

3. Apply Cohort Model

The cohort model brings together all the preparatory calculations described above. The output of the cohort model is final demographics data.The cohort model uses a methodology very similar to that employed in the migration estimation process. We age each cohort forward a year, account for deaths, apply in- and out-migration, and add births.

Age each demographic cohort forward one year.
For each demographic cohort, calculate deaths using the rates that were calculated earlier.
Calculate in-migration, then calculate out-migration based on new population counts (after in-migration).
Add births by multiplying the potential mothers cohorts (females ages 12-50) by the fertility rate of their cohort.
Apply the newborn cohort survival rate to the newly-created newborn population.

4. Adjust to Published Rates of Growth

Because the Census publishes national demographic projections (50 years out from last published Census), we want to adjust our projections to their results. We do so by making our national growth rate match that of the Census projections. We then proportionally adjust populations at each area child (i.e., state, county, tract) so they sum to the adjusted totals.

It is important to note that the projection adjustment step does not make our totals match that of the census projections. Instead our projected growth rate at the national level, from year to year, will match that of the Census Bureau's projected growth rates.

Once every tract for each demographic cohort has been adjusted accordingly, we save the output as our final demographic cohort model projection.

US Data Release Notes

CDC Birth & Mortality Rates

Age Demographics

IRS Migration Data

Population Estimates

Population Demographics Methodology

Overview

Process