Website alows the public full access to the 1940 Census images, census maps and descriptions.
https://www.icpsr.umich.edu/web/ICPSR/studies/8236/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8236/terms
The 1940 Census Public Use Microdata Sample Project was assembled through a collaborative effort between the United States Bureau of the Census and the Center for Demography and Ecology at the University of Wisconsin. The collection contains a stratified 1-percent sample of households, with separate records for each household, for each "sample line" respondent, and for each person in the household. These records were encoded from microfilm copies of original handwritten enumeration schedules from the 1940 Census of Population. Geographic identification of the location of the sampled households includes Census regions and divisions, states (except Alaska and Hawaii), standard metropolitan areas (SMAs), and state economic areas (SEAs). Accompanying the data collection is a codebook that includes an abstract, descriptions of sample design, processing procedures and file structure, a data dictionary (record layout), category code lists, and a glossary. Also included is a procedural history of the 1940 Census. Each of the 20 subsamples contains three record types: household, sample line, and person. Household variables describe the location and condition of the household. The sample line records contain variables describing demographic characteristics such as nativity, marital status, number of children, veteran status, wage deductions for Social Security, and occupation. Person records also contain variables describing demographic characteristics including nativity, marital status, family membership, education, employment status, income, and occupation.
This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1940 datasets.
The 1940 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1940, although some persons were missed. The 1940 census population schedules were digitized by the National Archives and Records Administration (NARA) and released publicly on April 2, 2012. The 1940 Census enumeration district maps contain maps of counties, cities, and other minor civil divisions that show enumeration districts, census tracts, and related boundaries and numbers used for each census. The coverage is nation wide and includes territorial areas. The 1940 Census enumeration district descriptions contain written descriptions of census districts, subdivisions, and enumeration districts.
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
This dataset includes all individuals from the 1940 US census.
This dataset includes all households from the 1940 US census.
These data comprise Census records relating to the Alaskan people's population demographics for the State of Alaskan Salmon and People (SASAP) Project. Decennial census data were originally extracted from IPUMS National Historic Geographic Information Systems website: https://data2.nhgis.org/main (Citation: Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota. 2017. http://doi.org/10.18128/D050.V12.0). A number of relevant tables of basic demographics on age and race, household income and poverty levels, and labor force participation were extracted. These particular variables were selected as part of an effort to understand and potentially quantify various dimensions of well-being in Alaskan communities. The file "censusdata_master.csv" is a consolidation of all 21 other data files in the package. For detailed information on how the datasets vary over different years, view the file "readme.docx" available in this data package. The included .Rmd file is a script which combines the 21 files by year into a single file (censusdata_master.csv). It also cleans up place names (including typographical errors) and uses the USGS place names dataset and the SASAP regions dataset to assign latitude and longitude values and region values to each place in the dataset. Note that some places were not assigned a region or location because they do not fit well into the regional framework. Considerable heterogeneity exists between census surveys each year. While we have attempted to combine these datasets in a way that makes sense, there may be some discrepancies or unexpected values. The RMarkdown document SASAPWebsiteGraphicsCensus.Rmd is used to generate a variety of figures using these data, including the additional file Chignik_population.png. An additional set of 25 figures showing regional trends in population and income metrics are also included.
https://www.icpsr.umich.edu/web/ICPSR/studies/8353/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8353/terms
This is an extract of the decennial Public Use Microdata Sample (PUMS) released by the Bureau of the Census. Because the complete PUMS files contain several hundred thousand records, ICPSR has constructed this subset to allow for easier and less costly analysis. The collection of data at ten year increments allows the user to follow various age cohorts through the life-cycle. Data include information on the household and its occupants such as size and value of dwelling, utility costs, number of people in the household, and their relationship to the respondent. More detailed information was collected on the respondent, the head of household, and the spouse, if present. Variables include education, marital status, occupation and income.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/ZFVVNAhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/ZFVVNA
The CenSoc WWII Army Enlistment Dataset is a cleaned and harmonized version of the National Archives and Records Administration’s Electronic Army Serial Number Merged File, ca. 1938 - 1946 (2002). It contains enlistment records for over 9 million men and women who served in the United States Army, including the Army Air Corps, Women's Army Auxiliary Corps, and Enlisted Reserve Corps. We publish links between men in the CenSoc WWII Army Enlistment Dataset, Social Security Administration mortality data, and the 1940 Census. The CenSoc Enlistment-Census-1940 file links these enlistment records to the complete 1940 Census, and may be merged with IPUMS-USA census data using the HISTID identifier variable. The CenSoc Enlistment-Numident file links enlistment records to the Berkley Unified Numident Mortality Database (BUNMD), and the CenSoc Enlistment-DMF file links enlistment records to the Social Security Death Master File. For enlistment records in the Enlistment-Numident and Enlistment-DMF datasets that have been independently and additionally linked to the 1940 Census, we include the HISTID identifier variable that can be used to merge the data with IPUMS census data.
This crosswalk consists of individuals matched between the 1850 and 1940 complete-count US Censuses. Within the crosswalk, users have the option to select the linking method with which these matches were created. This version of the crosswalk contains links made by the ABE-exact (conservative and standard) method, the ABE-NYSIIS (conservative and standard) method and the ABE-NYSIIS (conservative and standard) method where race is used as a matching variable. Users can then merge into this crosswalk a wide set of individual- and household-level variables provided publicly by IPUMS, thereby creating a historical longitudinal dataset for analysis.
This study matches Canadian and US manufacturing industries at the 2-digit SIC code level for census years 1900 to 1940. Canadian figures start at 1870. Only general figures were recorded, such as number of employees, number of establishments, salary and wages, gross production, cost of input materials, gross value added. The project does have some drawbacks, such as the lack of US figures gross production, cost of materials, and lack of figures for the iron and steel industry. But for an aggregate comparison of the two countries, the numbers can be considered reliable.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises physician-level entries from the 1906 American Medical Directory, the first in a series of semi-annual directories of all practicing physicians published by the American Medical Association [1]. Physicians are consistently listed by city, county, and state. Most records also include details about the place and date of medical training. From 1906-1940, Directories also identified the race of black physicians [2].This dataset comprises physician entries for a subset of US states and the District of Columbia, including all of the South and several adjacent states (Alabama, Arkansas, Delaware, Florida, Georgia, Kansas, Kentucky, Louisiana, Maryland, Mississippi, Missouri, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, West Virginia). Records were extracted via manual double-entry by professional data management company [3], and place names were matched to latitude/longitude coordinates. The main source for geolocating physician entries was the US Census. Historical Census records were sourced from IPUMS National Historical Geographic Information System [4]. Additionally, a public database of historical US Post Office locations was used to match locations that could not be found using Census records [5]. Fuzzy matching algorithms were also used to match misspelled place or county names [6].The source of geocoding match is described in the “match.source” field (Type of spatial match (census_YEAR = match to NHGIS census place-county-state for given year; census_fuzzy_YEAR = matched to NHGIS place-county-state with fuzzy matching algorithm; dc = matched to centroid for Washington, DC; post_places = place-county-state matched to Blevins & Helbock's post office dataset; post_fuzzy = matched to post office dataset with fuzzy matching algorithm; post_simp = place/state matched to post office dataset; post_confimed_missing = post office dataset confirms place and county, but could not find coordinates; osm = matched using Open Street Map geocoder; hand-match = matched by research assistants reviewing web archival sources; unmatched/hand_match_missing = place coordinates could not be found). For records where place names could not be matched, but county names could, coordinates for county centroids were used. Overall, 40,964 records were matched to places (match.type=place_point) and 931 to county centroids ( match.type=county_centroid); 76 records could not be matched (match.type=NA).Most records include information about the physician’s medical training, including the year of graduation and a code linking to a school. A key to these codes is given on Directory pages 26-27, and at the beginning of each state’s section [1]. The OSM geocoder was used to assign coordinates to each school by its listed location. Straight-line distances between physicians’ place of training and practice were calculated using the sf package in R [7], and are given in the “school.dist.km” field. Additionally, the Directory identified a handful of schools that were “fraudulent” (school.fraudulent=1), and institutions set up to train black physicians (school.black=1).AMA identified black physicians in the directory with the signifier “(col.)” following the physician’s name (race.black=1). Additionally, a number of physicians attended schools identified by AMA as serving black students, but were not otherwise identified as black; thus an expanded racial identifier was generated to identify black physicians (race.black.prob=1), including physicians who attended these schools and those directly identified (race.black=1).Approximately 10% of dataset entries were audited by trained research assistants, in addition to 100% of black physician entries. These audits demonstrated a high degree of accuracy between the original Directory and extracted records. Still, given the complexity of matching across multiple archival sources, it is possible that some errors remain; any identified errors will be periodically rectified in the dataset, with a log kept of these updates.For further information about this dataset, or to report errors, please contact Dr Ben Chrisinger (Benjamin.Chrisinger@tufts.edu). Future updates to this dataset, including additional states and Directory years, will be posted here: https://dataverse.harvard.edu/dataverse/amd.References:1. American Medical Association, 1906. American Medical Directory. American Medical Association, Chicago. Retrieved from: https://catalog.hathitrust.org/Record/000543547.2. Baker, Robert B., Harriet A. Washington, Ololade Olakanmi, Todd L. Savitt, Elizabeth A. Jacobs, Eddie Hoover, and Matthew K. Wynia. "African American physicians and organized medicine, 1846-1968: origins of a racial divide." JAMA 300, no. 3 (2008): 306-313. doi:10.1001/jama.300.3.306.3. GABS Research Consult Limited Company, https://www.gabsrcl.com.4. Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 17.0 [GNIS, TIGER/Line & Census Maps for US Places and Counties: 1900, 1910, 1920, 1930, 1940, 1950; 1910_cPHA: ds37]. Minneapolis, MN: IPUMS. 2022. http://doi.org/10.18128/D050.V17.05. Blevins, Cameron; Helbock, Richard W., 2021, "US Post Offices", https://doi.org/10.7910/DVN/NUKCNA, Harvard Dataverse, V1, UNF:6:8ROmiI5/4qA8jHrt62PpyA== [fileUNF]6. fedmatch: Fast, Flexible, and User-Friendly Record Linkage Methods. https://cran.r-project.org/web/packages/fedmatch/index.html7. sf: Simple Features for R. https://cran.r-project.org/web/packages/sf/index.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Subcounty housing unit counts are important for studying geo-historical patterns of (sub)urbanization, land-use change, and residential loss and gain. The most commonly used subcounty geographical unit for social research in the United States is the census tract. However, their changing geometries and historically incomplete coverage present significant obstacles for longitudinal analysis that existing datasets do not adequately address. Overcoming these barriers, we provide housing unit estimates in consistent 2010 tract boundaries for every census year from 1940 to 2010 plus 2019 for the entire continental US. Moreover, we develop an “urbanization year” indicator that denotes if and when tracts became “urbanized” during this timeframe. We produce these data by blending existing interpolation techniques with a novel procedure we call “maximum reabsorption”. Conducting out-of-sample validation, we find that our hybrid approach generally produces more reliable estimates than existing alternatives. The final dataset, Historical Housing Unit and Urbanization Database 2010 (HHUUD10), has myriad potential uses for research involving housing, population, and land-use change, as well as (sub)urbanization.
The world's population first reached one billion people in 1805, and reached eight billion in 2022, and will peak at almost 10.2 billion by the end of the century. Although it took thousands of years to reach one billion people, it did so at the beginning of a phenomenon known as the demographic transition; from this point onwards, population growth has skyrocketed, and since the 1960s the population has increased by one billion people every 12 to 15 years. The demographic transition sees a sharp drop in mortality due to factors such as vaccination, sanitation, and improved food supply; the population boom that follows is due to increased survival rates among children and higher life expectancy among the general population; and fertility then drops in response to this population growth. Regional differences The demographic transition is a global phenomenon, but it has taken place at different times across the world. The industrialized countries of Europe and North America were the first to go through this process, followed by some states in the Western Pacific. Latin America's population then began growing at the turn of the 20th century, but the most significant period of global population growth occurred as Asia progressed in the late-1900s. As of the early 21st century, almost two-thirds of the world's population lives in Asia, although this is set to change significantly in the coming decades. Future growth The growth of Africa's population, particularly in Sub-Saharan Africa, will have the largest impact on global demographics in this century. From 2000 to 2100, it is expected that Africa's population will have increased by a factor of almost five. It overtook Europe in size in the late 1990s, and overtook the Americas a few years later. In contrast to Africa, Europe's population is now in decline, as birth rates are consistently below death rates in many countries, especially in the south and east, resulting in natural population decline. Similarly, the population of the Americas and Asia are expected to go into decline in the second half of this century, and only Oceania's population will still be growing alongside Africa. By 2100, the world's population will have over three billion more than today, with the vast majority of this concentrated in Africa. Demographers predict that climate change is exacerbating many of the challenges that currently hinder progress in Africa, such as political and food instability; if Africa's transition is prolonged, then it may result in further population growth that would place a strain on the region's resources, however, curbing this growth earlier would alleviate some of the pressure created by climate change.
As of July 2024, Nigeria's population was estimated at around 229.5 million. Between 1965 and 2024, the number of people living in Nigeria increased at an average rate of over two percent. In 2024, the population grew by 2.42 percent compared to the previous year. Nigeria is the most populous country in Africa. By extension, the African continent records the highest growth rate in the world. Africa's most populous country Nigeria was the most populous country in Africa as of 2023. As of 2022, Lagos held the distinction of being Nigeria's biggest urban center, a status it also retained as the largest city across all of sub-Saharan Africa. The city boasted an excess of 17.5 million residents. Notably, Lagos assumed the pivotal roles of the nation's primary financial hub, cultural epicenter, and educational nucleus. Furthermore, Lagos was one of the largest urban agglomerations in the world. Nigeria's youthful population In Nigeria, a significant 50 percent of the populace is under the age of 19. The most prominent age bracket is constituted by those up to four years old: comprising 8.3 percent of men and eight percent of women as of 2021. Nigeria boasts one of the world's most youthful populations. On a broader scale, both within Africa and internationally, Niger maintains the lowest median age record. Nigeria secures the 20th position in global rankings. Furthermore, the life expectancy in Nigeria is an average of 62 years old. However, this is different between men and women. The main causes of death have been neonatal disorders, malaria, and diarrheal diseases.
These data comprise Census records relating to the Alaskan people's population demographics for the State of Alaskan Salmon and People (SASAP) Project. Decennial census data were originally extracted from IPUMS National Historic Geographic Information Systems website: https://data2.nhgis.org/main(Citation: Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota. 2017. http://doi.org/10.18128/D050.V12.0). A number of relevant tables of basic demographics on age and race, household income and poverty levels, and labor force participation were extracted.
These particular variables were selected as part of an effort to understand and potentially quantify various dimensions of well-being in Alaskan communities.
The file "censusdata_master.csv" is a consolidation of all 21 other data files in the package. For detailed information on how the datasets vary over different years, view the file "readme.docx" available in this data package.
The included .Rmd file is a script which combines the 21 files by year into a single file (censusdata_master.csv). It also cleans up place names (including typographical errors) and uses the
USGS place names dataset and the SASAP regions dataset to assign latitude and longitude values and region values to each place in the dataset. Note that some places were not assigned a region or
location because they do not fit well into the regional framework.
Considerable heterogeneity exists between census surveys each year. While we have attempted to combine these datasets in a way that makes sense, there may be some discrepancies or unexpected values.
Please send a description of any unusual values to the dataset contact.
The CenSoc-Numident dataset links the 1940 census to the National Archives’ public release of the Social Security Numident file (“NARA Numident”). Our linking strategy relies on first name, last name, year of birth, and place of birth. To link unmarried women, we use father’s last name as a proxy for women’s maiden name. We use the ABE fully automated linking approach developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). To work with this dataset, researchers must download and link the 1940 full-count Census sample from IPUMS-USA on the HISTID variable. Please adhere to the citation and usage guidelines of both CenSoc and IPUMS-USA when using this dataset.
In 1800, the population of the area of modern-day Bangladesh was estimated to be just over 19 million, a figure which would rise steadily throughout the 19th century, reaching over 26 million by 1900. At the time, Bangladesh was the eastern part of the Bengal region in the British Raj, and had the most-concentrated Muslim population in the subcontinent's east. At the turn of the 20th century, the British colonial administration believed that east Bengal was economically lagging behind the west, and Bengal was partitioned in 1905 as a means of improving the region's development. East Bengal then became the only Muslim-majority state in the eastern Raj, which led to socioeconomic tensions between the Hindu upper classes and the general population. Bengal Famine During the Second World War, over 2.5 million men from across the British Raj enlisted in the British Army and their involvement was fundamental to the war effort. The war, however, had devastating consequences for the Bengal region, as the famine of 1943-1944 resulted in the deaths of up to three million people (with over two thirds thought to have been in the east) due to starvation and malnutrition-related disease. As the population boomed in the 1930s, East Bengal's mismanaged and underdeveloped agricultural sector could not sustain this growth; by 1942, food shortages spread across the region, millions began migrating in search of food and work, and colonial mismanagement exacerbated this further. On the brink of famine in early-1943, authorities in India called for aid and permission to redirect their own resources from the war effort to combat the famine, however these were mostly rejected by authorities in London. While the exact extent of each of these factors on causing the famine remains a topic of debate, the general consensus is that the British War Cabinet's refusal to send food or aid was the most decisive. Food shortages did not dissipate until late 1943, however famine deaths persisted for another year. Partition to independence Following the war, the movement for Indian independence reached its final stages as the process of British decolonization began. Unrest between the Raj's Muslim and Hindu populations led to the creation of two separate states in1947; the Muslim-majority regions became East Pakistan (now Bangladesh) and West Pakistan (now Pakistan), separated by the Hindu-majority India. Although East Pakistan's population was larger, power lay with the military in the west, and authorities grew increasingly suppressive and neglectful of the eastern province in the following years. This reached a tipping point when authorities failed to respond adequately to the Bhola cyclone in 1970, which claimed over half a million lives in the Bengal region, and again when they failed to respect the results of the 1970 election, in which the Bengal party Awami League won the majority of seats. Bangladeshi independence was claimed the following March, leading to a brutal war between East and West Pakistan that claimed between 1.5 and three million deaths in just nine months. The war also saw over half of the country displaced, widespread atrocities, and the systematic rape of hundreds of thousands of women. As the war spilled over into India, their forces joined on the side of Bangladesh, and Pakistan was defeated two weeks later. An additional famine in 1974 claimed the lives of several hundred thousand people, meaning that the early 1970s was one of the most devastating periods in the country's history. Independent Bangladesh In the first decades of independence, Bangladesh's political hierarchy was particularly unstable and two of its presidents were assassinated in military coups. Since transitioning to parliamentary democracy in the 1990s, things have become comparatively stable, although political turmoil, violence, and corruption are persistent challenges. As Bangladesh continues to modernize and industrialize, living standards have increased and individual wealth has risen. Service industries have emerged to facilitate the demands of Bangladesh's developing economy, while manufacturing industries, particularly textiles, remain strong. Declining fertility rates have seen natural population growth fall in recent years, although the influx of Myanmar's Rohingya population due to the displacement crisis has seen upwards of one million refugees arrive in the country since 2017. In 2020, it is estimated that Bangladesh has a population of approximately 165 million people.
In 1800, the population of the region of present-day India was approximately 169 million. The population would grow gradually throughout the 19th century, rising to over 240 million by 1900. Population growth would begin to increase in the 1920s, as a result of falling mortality rates, due to improvements in health, sanitation and infrastructure. However, the population of India would see it’s largest rate of growth in the years following the country’s independence from the British Empire in 1948, where the population would rise from 358 million to over one billion by the turn of the century, making India the second country to pass the billion person milestone. While the rate of growth has slowed somewhat as India begins a demographics shift, the country’s population has continued to grow dramatically throughout the 21st century, and in 2020, India is estimated to have a population of just under 1.4 billion, well over a billion more people than one century previously. Today, approximately 18% of the Earth’s population lives in India, and it is estimated that India will overtake China to become the most populous country in the world within the next five years.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Website alows the public full access to the 1940 Census images, census maps and descriptions.