Website alows the public full access to the 1940 Census images, census maps and descriptions.
The 1940 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1940, although some persons were missed. The 1940 census population schedules were digitized by the National Archives and Records Administration (NARA) and released publicly on April 2, 2012. The 1940 Census enumeration district maps contain maps of counties, cities, and other minor civil divisions that show enumeration districts, census tracts, and related boundaries and numbers used for each census. The coverage is nation wide and includes territorial areas. The 1940 Census enumeration district descriptions contain written descriptions of census districts, subdivisions, and enumeration districts.
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
The 1940 Census Public Use Microdata Sample Project was assembled through a collaborative effort between the United States Bureau of the Census and the Center for Demography and Ecology at the University of Wisconsin. The collection contains a stratified 1-percent sample of households, with separate records for each household, for each "sample line" respondent, and for each person in the household. These records were encoded from microfilm copies of original handwritten enumeration schedules from the 1940 Census of Population. Geographic identification of the location of the sampled households includes Census regions and divisions, states (except Alaska and Hawaii), standard metropolitan areas (SMAs), and state economic areas (SEAs). Accompanying the data collection is a codebook that includes an abstract, descriptions of sample design, processing procedures and file structure, a data dictionary (record layout), category code lists, and a glossary. Also included is a procedural history of the 1940 Census. Each of the 20 subsamples contains three record types: household, sample line, and person. Household variables describe the location and condition of the household. The sample line records contain variables describing demographic characteristics such as nativity, marital status, number of children, veteran status, wage deductions for Social Security, and occupation. Person records also contain variables describing demographic characteristics including nativity, marital status, family membership, education, employment status, income, and occupation. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR08236.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
This dataset includes all individuals from the 1940 US census.
This dataset includes all households from the 1940 US census.
This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1940 datasets.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/ZFVVNAhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/ZFVVNA
The CenSoc WWII Army Enlistment Dataset is a cleaned and harmonized version of the National Archives and Records Administration’s Electronic Army Serial Number Merged File, ca. 1938 - 1946 (2002). It contains enlistment records for over 9 million men and women who served in the United States Army, including the Army Air Corps, Women's Army Auxiliary Corps, and Enlisted Reserve Corps. We publish links between men in the CenSoc WWII Army Enlistment Dataset, Social Security Administration mortality data, and the 1940 Census. The CenSoc Enlistment-Census-1940 file links these enlistment records to the complete 1940 Census, and may be merged with IPUMS-USA census data using the HISTID identifier variable. The CenSoc Enlistment-Numident file links enlistment records to the Berkley Unified Numident Mortality Database (BUNMD), and the CenSoc Enlistment-DMF file links enlistment records to the Social Security Death Master File. For enlistment records in the Enlistment-Numident and Enlistment-DMF datasets that have been independently and additionally linked to the 1940 Census, we include the HISTID identifier variable that can be used to merge the data with IPUMS census data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Veterans’ Grandchildren Mortality Plus sample consists of the records of more than 35,700 total grandchildrenboth male and female in nearly equal numbers,about 28,000 of which survived to age 45,who were born after the war to 16,791 children of 2,825 veterans,and contains an oversample of ex-POW veterans.The primary purpose of the project was to explore how grandfathers’ trauma affects the longevity and overweight of descendants. The dataset contains birth and death dates of grandchildren, census information on their parents' household, select socioeconomic and education information from the 1930 and 1940 census, and height and weight information from WWII draft cards for the grandsons. This multigenerational dataset can be used for researching the intergenerational transmission of longevity, overweight and socioeconomic status and the sex-specific pathways of this transmission and for testing mechanical linkage algorithms. Researchers built on a previously collected NIA-funded project containing census and death information of children of ex-POW and non-POW veterans (“Early Indicators, Intergenerational Processes, and Aging,” NIA grant P01AG10120, PI: Costa). The Veterans’ Grandchildren Mortality Plus data set contains the newly collected records of the veterans’ grandchildren, as well as the previously collected data of the veterans and their children.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAddressing contemporary anti-Asian racism and its impacts on health requires understanding its historical roots, including discriminatory restrictions on immigration, citizenship, and land ownership. Archival secondary data such as historical census records provide opportunities to quantitatively analyze structural dynamics that affect the health of Asian immigrants and Asian Americans. Census data overcome weaknesses of other data sources, such as small sample size and aggregation of Asian subgroups. This article explores the strengths and limitations of early twentieth-century census data for understanding Asian Americans and structural racism.MethodsWe used California census data from three decennial census spanning 1920–1940 to compare two criteria for identifying Asian Americans: census racial categories and Asian surname lists (Chinese, Indian, Japanese, Korean, and Filipino) that have been validated in contemporary population data. This paper examines the sensitivity and specificity of surname classification compared to census-designated “color or race” at the population level.ResultsSurname criteria were found to be highly specific, with each of the five surname lists having a specificity of over 99% for all three census years. The Chinese surname list had the highest sensitivity (ranging from 0.60–0.67 across census years), followed by the Indian (0.54–0.61) and Japanese (0.51–0.62) surname lists. Sensitivity was much lower for Korean (0.40–0.45) and Filipino (0.10–0.21) surnames. With the exception of Indian surnames, the sensitivity values of surname criteria were lower for the 1920–1940 census data than those reported for the 1990 census. The extent of the difference in sensitivity and trends across census years vary by subgroup.DiscussionSurname criteria may have lower sensitivity in detecting Asian subgroups in historical data as opposed to contemporary data as enumeration procedures for Asians have changed across time. We examine how the conflation of race, ethnicity, and nationality in the census could contribute to low sensitivity of surname classification compared to census-designated “color or race.” These results can guide decisions when operationalizing race in the context of specific research questions, thus promoting historical quantitative study of Asian American experiences. Furthermore, these results stress the need to situate measures of race and racism in their specific historical context.
This is an extract of the decennial Public Use Microdata Sample (PUMS) released by the Bureau of the Census. Because the complete PUMS files contain several hundred thousand records, ICPSR has constructed this subset to allow for easier and less costly analysis. The collection of data at ten year increments allows the user to follow various age cohorts through the life-cycle. Data include information on the household and its occupants such as size and value of dwelling, utility costs, number of people in the household, and their relationship to the respondent. More detailed information was collected on the respondent, the head of household, and the spouse, if present. Variables include education, marital status, occupation and income. The stratified sample has unequal sampling rates across strata and requires the use of weights for analyses using more than one stratum. The epsem sample was selected in a second stage from the stratified sample and used compensating sampling rates within each stratum so that the overall probability of selection for each person is equal. The person level weight for use with the stratified sample and the household weight to be used with the epsem sample are included in the data file.Conducted by the United States Department of Commerce, Bureau of the Census. Stratified sample of adults contained in the Public Use Microdata Sample. Approximately 500 records were drawn from each of 28 sex/age/race strata. Additionally, an equal probability (epsem) sample was drawn from the stratified sample. Datasets: DS0: Study-Level Files DS1: United States Microdata Samples Extract File, 1940-1980: Demographics of Aging DS2: Frequencies, 1940-1980 For 1960-1980, all PUMS records for persons 18 and over. For 1940 and 1950, all sample line records.
This crosswalk consists of individuals matched between the 1850 and 1940 complete-count US Censuses. Within the crosswalk, users have the option to select the linking method with which these matches were created. This version of the crosswalk contains links made by the ABE-exact (conservative and standard) method, the ABE-NYSIIS (conservative and standard) method and the ABE-NYSIIS (conservative and standard) method where race is used as a matching variable. Users can then merge into this crosswalk a wide set of individual- and household-level variables provided publicly by IPUMS, thereby creating a historical longitudinal dataset for analysis.
https://www.icpsr.umich.edu/web/ICPSR/studies/2877/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/2877/terms
This data collection, Aging of Veterans of the Union Army: Surgeons' Certificates, United States, 1862-1940, constitutes a portion of the historical data collected by the project "Early Indicators of Later Work Levels, Disease, and Death." With the goal of constructing datasets suitable for longitudinal analyses of factors affecting the aging process, the project collects military, medical, and socioeconomic data on a sample of white males mustered into the Union Army during the Civil War. The surgeons' certificates contain information from examining physicians to determine eligibility for pension benefits. Also included are questions regarding the age, occupation, residence, and military experience of the veterans. These data can be linked to "Aging of Veterans of the Union Army: Military, Pension, and Medical Records, 1820-1940" (ICPSR 6837) and "Aging of Veterans of the Union Army: United States Federal Census Records, 1850, 1860, 1900, 1910" (ICPSR 6836) using the variable "recidnum."
https://www.icpsr.umich.edu/web/ICPSR/studies/28501/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/28501/terms
The 1915 Iowa State Census is a unique document. It was the first census in the United States to include information on education and income prior to the United States Federal Census of 1940. It contains considerable detail on other aspects of individuals and households, e.g., religion, wealth and years in the United States and Iowa. The Iowa State Census of 1915 was a complete sample of the residents of the state and the returns were written by census takers (assessors) on index cards. These cards were kept in the Iowa State Archives in Des Moines and were microfilmed in 1986 by the Genealogical Society of Salt Lake City. The census cards were sorted by county, although large cities (those having more than 25,000 residents) were grouped separately. Within each county or large city, records were alphabetized by last name and within last name by first name. This data set includes individual-level records for three of the largest Iowa cities (Des Moines, Dubuque, and Davenport; the Sioux City films were unreadable) and for ten counties that did not contain a large city. (Additional details on sample selection are available in the documentation). Variables include name, age, place of residence, earnings, education, birthplace, religion, marital status, race, occupation, military service, among others. Data on familial ties between records are also included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Peru Census Population: Urban: Junín data was reported at 884,928.000 Person in 2017. This records an increase from the previous number of 752,337.000 Person for 2007. Peru Census Population: Urban: Junín data is updated yearly, averaging 510,662.000 Person from Jun 1940 (Median) to 2017, with 7 observations. The data reached an all-time high of 884,928.000 Person in 2017 and a record low of 137,776.000 Person in 1940. Peru Census Population: Urban: Junín data remains active status in CEIC and is reported by National Institute of Statistics and Informatics. The data is categorized under Global Database’s Peru – Table PE.G003: Population: Census by Department.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population Census: Central West: Mato Grosso do Sul data was reported at 2,757,013.000 Person in 2022. This records an increase from the previous number of 2,449,024.000 Person for 2010. Population Census: Central West: Mato Grosso do Sul data is updated yearly, averaging 1,778,741.000 Person from Jul 1940 (Median) to 2022, with 11 observations. The data reached an all-time high of 2,757,013.000 Person in 2022 and a record low of 238,640.000 Person in 1940. Population Census: Central West: Mato Grosso do Sul data remains active status in CEIC and is reported by Brazilian Institute of Geography and Statistics. The data is categorized under Global Database’s Brazil – Table BR.GAC008: Population Census: by State.
These data comprise Census records relating to the Alaskan people's population demographics for the State of Alaskan Salmon and People (SASAP) Project. Decennial census data were originally extracted from IPUMS National Historic Geographic Information Systems website: https://data2.nhgis.org/main (Citation: Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota. 2017. http://doi.org/10.18128/D050.V12.0). A number of relevant tables of basic demographics on age and race, household income and poverty levels, and labor force participation were extracted. These particular variables were selected as part of an effort to understand and potentially quantify various dimensions of well-being in Alaskan communities. The file "censusdata_master.csv" is a consolidation of all 21 other data files in the package. For detailed information on how the datasets vary over different years, view the file "readme.docx" available in this data package. The included .Rmd file is a script which combines the 21 files by year into a single file (censusdata_master.csv). It also cleans up place names (including typographical errors) and uses the USGS place names dataset and the SASAP regions dataset to assign latitude and longitude values and region values to each place in the dataset. Note that some places were not assigned a region or location because they do not fit well into the regional framework. Considerable heterogeneity exists between census surveys each year. While we have attempted to combine these datasets in a way that makes sense, there may be some discrepancies or unexpected values. The RMarkdown document SASAPWebsiteGraphicsCensus.Rmd is used to generate a variety of figures using these data, including the additional file Chignik_population.png. An additional set of 25 figures showing regional trends in population and income metrics are also included.
1940 Dwellings Census Data for Baltimore, Maryland. Refer to the 1940 codebook (codebook_1940.pdf) for more information. This is part of a collection of 221 Baltimore Ecosystem Study metadata records that point to a geodatabase. The geodatabase is available online and is considerably large. Upon request, and under certain arrangements, it can be shipped on media, such as a usb hard drive. The geodatabase is roughly 51.4 Gb in size, consisting of 4,914 files in 160 folders. Although this metadata record and the others like it are not rich with attributes, it is nonetheless made available because the data that it represents could be indeed useful.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises physician-level entries from the 1906 American Medical Directory, the first in a series of semi-annual directories of all practicing physicians published by the American Medical Association [1]. Physicians are consistently listed by city, county, and state. Most records also include details about the place and date of medical training. From 1906-1940, Directories also identified the race of black physicians [2].This dataset comprises physician entries for a subset of US states and the District of Columbia, including all of the South and several adjacent states (Alabama, Arkansas, Delaware, Florida, Georgia, Kansas, Kentucky, Louisiana, Maryland, Mississippi, Missouri, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, West Virginia). Records were extracted via manual double-entry by professional data management company [3], and place names were matched to latitude/longitude coordinates. The main source for geolocating physician entries was the US Census. Historical Census records were sourced from IPUMS National Historical Geographic Information System [4]. Additionally, a public database of historical US Post Office locations was used to match locations that could not be found using Census records [5]. Fuzzy matching algorithms were also used to match misspelled place or county names [6].The source of geocoding match is described in the “match.source” field (Type of spatial match (census_YEAR = match to NHGIS census place-county-state for given year; census_fuzzy_YEAR = matched to NHGIS place-county-state with fuzzy matching algorithm; dc = matched to centroid for Washington, DC; post_places = place-county-state matched to Blevins & Helbock's post office dataset; post_fuzzy = matched to post office dataset with fuzzy matching algorithm; post_simp = place/state matched to post office dataset; post_confimed_missing = post office dataset confirms place and county, but could not find coordinates; osm = matched using Open Street Map geocoder; hand-match = matched by research assistants reviewing web archival sources; unmatched/hand_match_missing = place coordinates could not be found). For records where place names could not be matched, but county names could, coordinates for county centroids were used. Overall, 40,964 records were matched to places (match.type=place_point) and 931 to county centroids ( match.type=county_centroid); 76 records could not be matched (match.type=NA).Most records include information about the physician’s medical training, including the year of graduation and a code linking to a school. A key to these codes is given on Directory pages 26-27, and at the beginning of each state’s section [1]. The OSM geocoder was used to assign coordinates to each school by its listed location. Straight-line distances between physicians’ place of training and practice were calculated using the sf package in R [7], and are given in the “school.dist.km” field. Additionally, the Directory identified a handful of schools that were “fraudulent” (school.fraudulent=1), and institutions set up to train black physicians (school.black=1).AMA identified black physicians in the directory with the signifier “(col.)” following the physician’s name (race.black=1). Additionally, a number of physicians attended schools identified by AMA as serving black students, but were not otherwise identified as black; thus an expanded racial identifier was generated to identify black physicians (race.black.prob=1), including physicians who attended these schools and those directly identified (race.black=1).Approximately 10% of dataset entries were audited by trained research assistants, in addition to 100% of black physician entries. These audits demonstrated a high degree of accuracy between the original Directory and extracted records. Still, given the complexity of matching across multiple archival sources, it is possible that some errors remain; any identified errors will be periodically rectified in the dataset, with a log kept of these updates.For further information about this dataset, or to report errors, please contact Dr Ben Chrisinger (Benjamin.Chrisinger@tufts.edu). Future updates to this dataset, including additional states and Directory years, will be posted here: https://dataverse.harvard.edu/dataverse/amd.References:1. American Medical Association, 1906. American Medical Directory. American Medical Association, Chicago. Retrieved from: https://catalog.hathitrust.org/Record/000543547.2. Baker, Robert B., Harriet A. Washington, Ololade Olakanmi, Todd L. Savitt, Elizabeth A. Jacobs, Eddie Hoover, and Matthew K. Wynia. "African American physicians and organized medicine, 1846-1968: origins of a racial divide." JAMA 300, no. 3 (2008): 306-313. doi:10.1001/jama.300.3.306.3. GABS Research Consult Limited Company, https://www.gabsrcl.com.4. Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 17.0 [GNIS, TIGER/Line & Census Maps for US Places and Counties: 1900, 1910, 1920, 1930, 1940, 1950; 1910_cPHA: ds37]. Minneapolis, MN: IPUMS. 2022. http://doi.org/10.18128/D050.V17.05. Blevins, Cameron; Helbock, Richard W., 2021, "US Post Offices", https://doi.org/10.7910/DVN/NUKCNA, Harvard Dataverse, V1, UNF:6:8ROmiI5/4qA8jHrt62PpyA== [fileUNF]6. fedmatch: Fast, Flexible, and User-Friendly Record Linkage Methods. https://cran.r-project.org/web/packages/fedmatch/index.html7. sf: Simple Features for R. https://cran.r-project.org/web/packages/sf/index.html
https://borealisdata.ca/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.5683/SP3/INV5ZHhttps://borealisdata.ca/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.5683/SP3/INV5ZH
This study matches Canadian and US manufacturing industries at the 2-digit SIC code level for census years 1900 to 1940. Canadian figures start at 1870. Only general figures were recorded, such as number of employees, number of establishments, salary and wages, gross production, cost of input materials, gross value added. The project does have some drawbacks, such as the lack of US figures gross production, cost of materials, and lack of figures for the iron and steel industry. But for an aggregate comparison of the two countries, the numbers can be considered reliable.
Website alows the public full access to the 1940 Census images, census maps and descriptions.