Website alows the public full access to the 1940 Census images, census maps and descriptions.
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
1940 US Census contains records from Montpelier, Washington, Vermont, USA by Ancestry.com. 1940 United States Federal Census [database on-line]. Provo, UT, USA: Ancestry.com Operations, Inc., 2012.; Year: 1940; Census Place: Montpelier, Washington, Vermont; Roll: m-t0627-04238; Page: 3B; Enumeration District: 12-31; Original data: United States of America, Bureau of the Census. Sixteenth Census of the United States, 1940. Washington, D.C.: National Archives and Records Administration, 1940. T627, 4,643 rolls. - .
The 1940 Census Public Use Microdata Sample Project was assembled through a collaborative effort between the United States Bureau of the Census and the Center for Demography and Ecology at the University of Wisconsin. The collection contains a stratified 1-percent sample of households, with separate records for each household, for each "sample line" respondent, and for each person in the household. These records were encoded from microfilm copies of original handwritten enumeration schedules from the 1940 Census of Population. Geographic identification of the location of the sampled households includes Census regions and divisions, states (except Alaska and Hawaii), standard metropolitan areas (SMAs), and state economic areas (SEAs). Accompanying the data collection is a codebook that includes an abstract, descriptions of sample design, processing procedures and file structure, a data dictionary (record layout), category code lists, and a glossary. Also included is a procedural history of the 1940 Census. Each of the 20 subsamples contains three record types: household, sample line, and person. Household variables describe the location and condition of the household. The sample line records contain variables describing demographic characteristics such as nativity, marital status, number of children, veteran status, wage deductions for Social Security, and occupation. Person records also contain variables describing demographic characteristics including nativity, marital status, family membership, education, employment status, income, and occupation. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR08236.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
1940 United States Federal Census contains records from Philadelphia, Pennsylvania, USA by United States of America, Bureau of the Census. Sixteenth Census of the United States, 1940. Washington, D.C.: National Archives and Records Administration, 1940. T627, 4,643 rolls. Year: 1940; Census Place: Upper Dublin, Montgomery, Pennsylvania; Roll: m-t0627-03585; Page: 20B; Enumeration District: 46-208 - .
The 1940 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1940, although some persons were missed. The 1940 census population schedules were digitized by the National Archives and Records Administration (NARA) and released publicly on April 2, 2012. The 1940 Census enumeration district maps contain maps of counties, cities, and other minor civil divisions that show enumeration districts, census tracts, and related boundaries and numbers used for each census. The coverage is nation wide and includes territorial areas. The 1940 Census enumeration district descriptions contain written descriptions of census districts, subdivisions, and enumeration districts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. These links allow researchers to construct a longitudinal dataset that is highly representative of the population, and that includes women, Black Americans, and other under-represented populations at unprecedented rates. Each .csv file consists of a crosswalk between the two years indicated in the filename, using the IPUMS histids. For more information, consult the included Read Me file, and visit https://censustree.org.
This dataset includes all households from the 1940 US census.
This dataset includes all individuals from the 1940 US census.
1940 United States Federal Census contains records from Caribou, Maine, USA by Year: 1940; Census Place: Caribou, Aroostook, Maine; Roll: m-t0627-01471; Page: 13A; Enumeration District: 2-12 - .
This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1940 datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. These links allow researchers to construct a longitudinal dataset that is highly representative of the population, and that includes women, Black Americans, and other under-represented populations at unprecedented rates. Each .csv file consists of a crosswalk between the two years indicated in the filename, using the IPUMS histids. For more information, consult the included Read Me file, and visit https://censustree.org.
1940 United States Federal Census contains records from Montpelier, Washington, Vermont, USA by Year: 1940; Census Place: Montpelier, Washington, Vermont; Roll: m-t0627-04238; Page: 3B; Enumeration District: 12-31 - .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States Population: All Ages data was reported at 325,719.000 Person th in 2017. This records an increase from the previous number of 323,406.000 Person th for 2016. United States Population: All Ages data is updated yearly, averaging 176,356.000 Person th from Jun 1900 (Median) to 2017, with 118 observations. The data reached an all-time high of 325,719.000 Person th in 2017 and a record low of 76,094.000 Person th in 1900. United States Population: All Ages data remains active status in CEIC and is reported by US Census Bureau. The data is categorized under Global Database’s United States – Table US.G002: Population by Age. Series Remarks Population data for the years 1900 to 1949 exclude the population residing in Alaska and Hawaii. Population data for the years 1940 to 1979 cover the resident population plus Armed Forces overseas. Population data for all other years cover only the resident population.
https://www.icpsr.umich.edu/web/ICPSR/studies/2877/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/2877/terms
This data collection, Aging of Veterans of the Union Army: Surgeons' Certificates, United States, 1862-1940, constitutes a portion of the historical data collected by the project "Early Indicators of Later Work Levels, Disease, and Death." With the goal of constructing datasets suitable for longitudinal analyses of factors affecting the aging process, the project collects military, medical, and socioeconomic data on a sample of white males mustered into the Union Army during the Civil War. The surgeons' certificates contain information from examining physicians to determine eligibility for pension benefits. Also included are questions regarding the age, occupation, residence, and military experience of the veterans. These data can be linked to "Aging of Veterans of the Union Army: Military, Pension, and Medical Records, 1820-1940" (ICPSR 6837) and "Aging of Veterans of the Union Army: United States Federal Census Records, 1850, 1860, 1900, 1910" (ICPSR 6836) using the variable "recidnum."
This data collection and its 1940 counterpart were assembled through a collaborative effort between the United States Bureau of the Census and the Center for Demography and Ecology of the University of Wisconsin. The 1940 and 1950 Census Public Use Sample Project was supported by The National Science Foundation under Grant SES-7704135. The collections contain a stratified 1-percent sample of households, with separate records for each household, for each \'sample line\' respondent, and for each person in the household. These records were encoded from microfilm copies of original handwritten enumeration schedules from the 1940 and 1950 Censuses of Population. The universe for the sample included all persons and households within the United States. Geographic identification of the location of the sampled households includes Census regions and divisions, States (except Alaska and Hawaii), Standard Metropolitan Areas (SMA\'s), and State Economic Areas (SEA\'s). The SMA\'s and SEA\'s are comparable for both the 1940 and 1950 Public Use Microdata Samples (PUMS). The data collections were constructed from and consist of 20 independently-drawn subsamples stored in 20 discrete physical files. Each of the 20 subsamples contains three record types (household, \'sample line\', and person). Both collections had both a complete-count and a sample component. Individuals selected for the sample component were asked a set of additional questions. Only households with a \'sample line\' person were included in the public use microdata sample. The collections also contain records of group quarters members who were also on the Census \'sample line\'. For the 1940 and 1950 collections, each household record contains variables describing the location and composition of the household. The \'sample line\' records for 1950 contain variables describing demographic characteristics such as nativity, marital status, number of children, veteran status, education, income, and occupation. The person records for 1950 contain such demographic variables as nativity, marital status, family membership, and occupation. Accompanying the data collections are code books which include an abstract, descriptions of sample design, processing procedures and file structure, a data dictionary (record layout), category code lists, and a glossary. The data collections are arranged by subsample with each subsample stored as a separate physical file of information. The 20 subsamples were selected randomly. Within each of the 20 subsamples, records are sequenced by State. Extracting all of the records for one State entails reading through all of the 20 physical files and selecting that State\'s records from each of the 20 subsamples. Record types are ordered within household (household characteristics first, \'sample line\' next, and person records last). The 1950 collection consists of a total of 2,844,458 data records: 461,130 household records, 461,130 \'sample line\' records, and 1,922,198 person records. Each record type has a logical record length of 133.;
https://www.icpsr.umich.edu/web/ICPSR/studies/8353/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8353/terms
This is an extract of the decennial Public Use Microdata Sample (PUMS) released by the Bureau of the Census. Because the complete PUMS files contain several hundred thousand records, ICPSR has constructed this subset to allow for easier and less costly analysis. The collection of data at ten year increments allows the user to follow various age cohorts through the life-cycle. Data include information on the household and its occupants such as size and value of dwelling, utility costs, number of people in the household, and their relationship to the respondent. More detailed information was collected on the respondent, the head of household, and the spouse, if present. Variables include education, marital status, occupation and income.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises physician-level entries from the 1906 American Medical Directory, the first in a series of semi-annual directories of all practicing physicians published by the American Medical Association [1]. Physicians are consistently listed by city, county, and state. Most records also include details about the place and date of medical training. From 1906-1940, Directories also identified the race of black physicians [2].This dataset comprises physician entries for a subset of US states and the District of Columbia, including all of the South and several adjacent states (Alabama, Arkansas, Delaware, Florida, Georgia, Kansas, Kentucky, Louisiana, Maryland, Mississippi, Missouri, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, West Virginia). Records were extracted via manual double-entry by professional data management company [3], and place names were matched to latitude/longitude coordinates. The main source for geolocating physician entries was the US Census. Historical Census records were sourced from IPUMS National Historical Geographic Information System [4]. Additionally, a public database of historical US Post Office locations was used to match locations that could not be found using Census records [5]. Fuzzy matching algorithms were also used to match misspelled place or county names [6].The source of geocoding match is described in the “match.source” field (Type of spatial match (census_YEAR = match to NHGIS census place-county-state for given year; census_fuzzy_YEAR = matched to NHGIS place-county-state with fuzzy matching algorithm; dc = matched to centroid for Washington, DC; post_places = place-county-state matched to Blevins & Helbock's post office dataset; post_fuzzy = matched to post office dataset with fuzzy matching algorithm; post_simp = place/state matched to post office dataset; post_confimed_missing = post office dataset confirms place and county, but could not find coordinates; osm = matched using Open Street Map geocoder; hand-match = matched by research assistants reviewing web archival sources; unmatched/hand_match_missing = place coordinates could not be found). For records where place names could not be matched, but county names could, coordinates for county centroids were used. Overall, 40,964 records were matched to places (match.type=place_point) and 931 to county centroids ( match.type=county_centroid); 76 records could not be matched (match.type=NA).Most records include information about the physician’s medical training, including the year of graduation and a code linking to a school. A key to these codes is given on Directory pages 26-27, and at the beginning of each state’s section [1]. The OSM geocoder was used to assign coordinates to each school by its listed location. Straight-line distances between physicians’ place of training and practice were calculated using the sf package in R [7], and are given in the “school.dist.km” field. Additionally, the Directory identified a handful of schools that were “fraudulent” (school.fraudulent=1), and institutions set up to train black physicians (school.black=1).AMA identified black physicians in the directory with the signifier “(col.)” following the physician’s name (race.black=1). Additionally, a number of physicians attended schools identified by AMA as serving black students, but were not otherwise identified as black; thus an expanded racial identifier was generated to identify black physicians (race.black.prob=1), including physicians who attended these schools and those directly identified (race.black=1).Approximately 10% of dataset entries were audited by trained research assistants, in addition to 100% of black physician entries. These audits demonstrated a high degree of accuracy between the original Directory and extracted records. Still, given the complexity of matching across multiple archival sources, it is possible that some errors remain; any identified errors will be periodically rectified in the dataset, with a log kept of these updates.For further information about this dataset, or to report errors, please contact Dr Ben Chrisinger (Benjamin.Chrisinger@tufts.edu). Future updates to this dataset, including additional states and Directory years, will be posted here: https://dataverse.harvard.edu/dataverse/amd.References:1. American Medical Association, 1906. American Medical Directory. American Medical Association, Chicago. Retrieved from: https://catalog.hathitrust.org/Record/000543547.2. Baker, Robert B., Harriet A. Washington, Ololade Olakanmi, Todd L. Savitt, Elizabeth A. Jacobs, Eddie Hoover, and Matthew K. Wynia. "African American physicians and organized medicine, 1846-1968: origins of a racial divide." JAMA 300, no. 3 (2008): 306-313. doi:10.1001/jama.300.3.306.3. GABS Research Consult Limited Company, https://www.gabsrcl.com.4. Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 17.0 [GNIS, TIGER/Line & Census Maps for US Places and Counties: 1900, 1910, 1920, 1930, 1940, 1950; 1910_cPHA: ds37]. Minneapolis, MN: IPUMS. 2022. http://doi.org/10.18128/D050.V17.05. Blevins, Cameron; Helbock, Richard W., 2021, "US Post Offices", https://doi.org/10.7910/DVN/NUKCNA, Harvard Dataverse, V1, UNF:6:8ROmiI5/4qA8jHrt62PpyA== [fileUNF]6. fedmatch: Fast, Flexible, and User-Friendly Record Linkage Methods. https://cran.r-project.org/web/packages/fedmatch/index.html7. sf: Simple Features for R. https://cran.r-project.org/web/packages/sf/index.html
These data comprise Census records relating to the Alaskan people's population demographics for the State of Alaskan Salmon and People (SASAP) Project. Decennial census data were originally extracted from IPUMS National Historic Geographic Information Systems website: https://data2.nhgis.org/main (Citation: Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota. 2017. http://doi.org/10.18128/D050.V12.0). A number of relevant tables of basic demographics on age and race, household income and poverty levels, and labor force participation were extracted. These particular variables were selected as part of an effort to understand and potentially quantify various dimensions of well-being in Alaskan communities. The file "censusdata_master.csv" is a consolidation of all 21 other data files in the package. For detailed information on how the datasets vary over different years, view the file "readme.docx" available in this data package. The included .Rmd file is a script which combines the 21 files by year into a single file (censusdata_master.csv). It also cleans up place names (including typographical errors) and uses the USGS place names dataset and the SASAP regions dataset to assign latitude and longitude values and region values to each place in the dataset. Note that some places were not assigned a region or location because they do not fit well into the regional framework. Considerable heterogeneity exists between census surveys each year. While we have attempted to combine these datasets in a way that makes sense, there may be some discrepancies or unexpected values. The RMarkdown document SASAPWebsiteGraphicsCensus.Rmd is used to generate a variety of figures using these data, including the additional file Chignik_population.png. An additional set of 25 figures showing regional trends in population and income metrics are also included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAddressing contemporary anti-Asian racism and its impacts on health requires understanding its historical roots, including discriminatory restrictions on immigration, citizenship, and land ownership. Archival secondary data such as historical census records provide opportunities to quantitatively analyze structural dynamics that affect the health of Asian immigrants and Asian Americans. Census data overcome weaknesses of other data sources, such as small sample size and aggregation of Asian subgroups. This article explores the strengths and limitations of early twentieth-century census data for understanding Asian Americans and structural racism.MethodsWe used California census data from three decennial census spanning 1920–1940 to compare two criteria for identifying Asian Americans: census racial categories and Asian surname lists (Chinese, Indian, Japanese, Korean, and Filipino) that have been validated in contemporary population data. This paper examines the sensitivity and specificity of surname classification compared to census-designated “color or race” at the population level.ResultsSurname criteria were found to be highly specific, with each of the five surname lists having a specificity of over 99% for all three census years. The Chinese surname list had the highest sensitivity (ranging from 0.60–0.67 across census years), followed by the Indian (0.54–0.61) and Japanese (0.51–0.62) surname lists. Sensitivity was much lower for Korean (0.40–0.45) and Filipino (0.10–0.21) surnames. With the exception of Indian surnames, the sensitivity values of surname criteria were lower for the 1920–1940 census data than those reported for the 1990 census. The extent of the difference in sensitivity and trends across census years vary by subgroup.DiscussionSurname criteria may have lower sensitivity in detecting Asian subgroups in historical data as opposed to contemporary data as enumeration procedures for Asians have changed across time. We examine how the conflation of race, ethnicity, and nationality in the census could contribute to low sensitivity of surname classification compared to census-designated “color or race.” These results can guide decisions when operationalizing race in the context of specific research questions, thus promoting historical quantitative study of Asian American experiences. Furthermore, these results stress the need to situate measures of race and racism in their specific historical context.
Website alows the public full access to the 1940 Census images, census maps and descriptions.