The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.
Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.
Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.
%3C!-- --%3E
This dataset was created on 2020-01-10 18:46:34.647
by merging multiple datasets together. The source datasets for this version were:
IPUMS 1920 households: This dataset includes all households from the 1920 US census.
IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.
IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.
This dataset includes all individuals from the 1920 US census.
This dataset includes all households from the 1920 US census.
This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.
Historical population as enumerated and corrected from 1790 through 2020. North Carolina was one of the 13 original States and by the time of the 1790 census had essentially its current boundaries. The Census is mandated by the United States Constitution and was first completed for 1790. The population has been counted every ten years hence, with some limitations. In 1790 census coverage included most of the State, except for areas in the west, parts of which were not enumerated until 1840. The population for 1810 includes Walton County, enumerated as part of Georgia although actually within North Carolina. Historical populations shown here reflect the population of the respective named county and not necessarily the population of the area of the county as it was defined for a particular census. County boundaries shown in maps reflect boundaries as defined in 2020. Historic boundaries for some counties may include additional geographic areas or may be smaller than the current geographic boundaries. Notes below list the county or counties with which the population of a currently defined county were enumerated historically (Current County: Population counted in). The current 100 counties have been in place since the 1920 Census, although some modifications to the county boundaries have occurred since that time. For historical county boundaries see: Atlas of Historical County Boundaries Project (newberry.org)County Notes: Note 1: Total for 1810 includes population (1,026) of Walton County, reported as a Georgia county but later determined to be situated in western North Carolina. Total for 1890 includes 2 Indians in prison, not reported by county. Note 2: Alexander: *Iredell, Burke, Wilkes. Note 3: Avery: *Caldwell, Mitchell, Watauga. Note 4: Buncombe: *Burke, Rutherford; see also note 22. Note 5: Caldwell: *Burke, Wilkes, Yancey. Note 6: Cleveland: *Rutherford, Lincoln. Note 7: Columbus: *Bladen, Brunswick. Note 8: Dare: *Tyrrell, Currituck, Hyde. Note 9: Hoke: *Cumberland, Robeson. Note 10: Jackson: *Macon, Haywood. Note 11: Lee: *Moore, Chatham. Note 12: Lenoir: *Dobbs (Greene); Craven. Note 13: McDowell: *Burke, Rutherford. Note 14: Madison: *Buncombe, Yancey. Note 15: Mitchell: *Yancey, Watauga. Note 16: Pamlico: *Craven, Beaufort. Note 17: Polk: *Rutherford, Henderson. Note 18: Swain: *Jackson, Macon. Note 19: Transylvania: *Henderson, Jackson. Note 20: Union: *Mecklenburg, Anson. Note 21: Vance: *Granville, Warren, Franklin. Note 22: Walton: Created in 1803 as a Georgia county and reported in 1810 as part of Georgia; abolished after a review of the State boundary determined that its area was located in North Carolina. By 1820 it was part of Buncombe County. Note 23: Watauga: *Ashe, Yancey, Wilkes; Burke. Note 24: Wilson: *Edgecombe, Nash, Wayne, Johnston. Note 25: Yancey: *Burke, Buncombe. Note 26: Alleghany: *Ashe. Note 27: Haywood: *Buncombe. Note 28: Henderson: *Buncombe. Note 29: Person: Caswell. Note 30: Clay: Cherokee. Note 31: Graham: Cherokee. Note 32: Harnett: Cumberland. Note 33: Macon: Haywood.
Note 34: Catawba: Lincoln. Note 35: Gaston: Lincoln. Note 36: Cabarrus: Mecklenburg.
Note 37: Stanly: Montgomery. Note 38: Pender: New Hanover. Note 39: Alamance: Orange.
Note 40: Durham: Orange, Wake. Note 41: Scotland: Richmond. Note 42: Davidson: Rowan. Note 43: Davie: Rowan.Note 44: Forsyth: Stokes. Note 45: Yadkin: Surry.
Note 46: Washington: Tyrrell.Note 47: Ashe: Wilkes. Part III. Population of Counties, Earliest Census to 1990The 1840 population of Person County, NC should be 9,790. The 1840 population of Perquimans County, NC should be 7,346.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/Q2QJ2Vhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/Q2QJ2V
This crosswalk consists of individuals matched between the 1910 and 1920 complete-count US Censuses. Within the crosswalk, users have the option to select the linking method with which these matches were created. This version of the crosswalk contains links made by the ABE-exact (conservative and standard) method, the ABE-NYSIIS (conservative and standard) method, ABE-EI exact (conservative and standard) method, and the ABE-EI NYSIIS (conservative and standard) method, with variants in which race is used as a matching variable. This crosswalk also includes Census Tree Links created by Joseph Price, Kasey Buckles and Mark Clement at the Brigham Young University (BYU) Record Linking Lab. For any chosen method, users can merge into this crosswalk a wide set of individual- and household-level variables provided publicly by IPUMS, thereby creating a historical longitudinal dataset for analysis.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441981https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441981
Abstract (en): These data on 19th- and early 20th-century police department and arrest behavior were collected between 1975 and 1978 for a study of police and crime in the United States. Raw and aggregated time-series data are presented in Parts 1 and 3 on 23 American cities for most years during the period 1860-1920. The data were drawn from annual reports of police departments found in the Library of Congress or in newspapers and legislative reports located elsewhere. Variables in Part 1, for which the city is the unit of analysis, include arrests for drunkenness, conditional offenses and homicides, persons dismissed or held, police personnel, and population. Part 3 aggregates the data by year and reports some of these variables on a per capita basis, using a linear interpolation from the last decennial census to estimate population. Part 2 contains data for 267 United States cities for the period 1880-1890 and was generated from the 1880 federal census volume, REPORT ON THE DEFECTIVE, DEPENDENT, AND DELINQUENT CLASSES, published in 1888, and from the 1890 federal census volume, SOCIAL STATISTICS OF CITIES. Information includes police personnel and expenditures, arrests, persons held overnight, trains entering town, and population. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. 2006-01-12 All files were removed from dataset 4 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 4 and flagged as study-level files, so that they will accompany all downloads.2005-11-04 On 2005-03-14 new files were added to one or more datasets. These files included additional setup files as well as one or more of the following: SAS program, SAS transport, SPSS portable, and Stata system files. The metadata record was revised 2005-11-04 to reflect these additions. Funding insitution(s): United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics.
The 1999 Census of Population and Housing (CPH) of the Republic of Marshall Islands (RMI)) is the tenth census conducted since 1920 and the second since RMI gained independence. The first population census in Marshall Islands was conducted in 1920, after which censuses were conducted every five years up to 1935 when World War II disrupted this pattern. The first census after World War II was in 1958, followed by censuses in 1967, 1973, 1980 and 1988.
The objectives of this census were to provide government planners, policy makers, the private sector and the international donor community with social and economic data and to fulfill the data requirements of the upcoming negotiation of the Compact of Free Association. Data on the size, composition and distribution of the population as well as the structural characteristics and available facilities of housing units were obtained.
National coverage.
Household and Individual.
All de jure household members were covered.
Census/enumeration data [cen]
Not applicable as it is a census.
Not applicable as it is a census.
Face-to-face [f2f]
Two types of questionnaires were drafted -- (1) CPH Form 2 gathers information on the demographic, social and economic characteristics of the population as well as the characteristics of the building and housing units, and (2) CPH Form 3 gathers information on people residing in institutional living quarters These questionnaires were reviewed several times by NCSC. OPS and CTC pre-tested the questionnaires at the end of March. Revisions were made on the basis of the pre-test and the revised questionnaires were reviewed again by the NCSC. After the questionnaires in English version were approved by NCSC, they were translated into Marshallese to facilitate the training of enumerators and supervisors. The English version of the questionnaires, however, was used in the actual enumeration with questions asked in Marshallese. The enumerators and supervisors kept a copy of the questionnaires in Marshallese for reference. Control forms such as listing sheets that will be used to generate preliminary counts were also prepared by CTC. These forms were designed to record the major step of the census operations.
The questionnaires were separated by type of form and folioed by EA. Each folio was checked for completeness. The questionnaires underwent two stages of processing -- manual processing and machine processing. Manual processing involved the verification of geographic identification, review of the entries for completeness, consistency and acceptability of responses and coding of selected items. Data editing, verification of questionnaire and/or callbacks were performed in iteration until all the data editing rules have been fulfilled or when there are no more reject listing on the particular questionnaire. Some data records had to be edited four times. This means that four iterations of the steps mentioned above had to be done before the records or questionnaires could be declared without error. Twenty-four people were involved in the data processing process.These are the ADB Data Processing Consultant, a national data processing specialist from OPS, 9 manual processors, 5 keyers for data entries, 1 keyer for field editing, 6 data processors and 1 keyer for updating of the data files.
Not applicable as this is a census.
The preliminary population counts by atoll and by sex and atoll were generated based on the listing sheet in the first week of August 1999. These were compared to the 1988 and 1980 censuses. The comparison indicated that the average annual population growth rate between 1988 and 1999 was lower than expected. The possible undercount in the 1999 census was investigated. The CTC proposed a plan to revisit the major atolls of Majuro and Kwajalein that the NCSC discussed and approved.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de451385https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de451385
Abstract (en): This collection includes county-level data from the United States Censuses of Agriculture for the years 1840 to 2012. The files provide data about the number, types, output, and prices of various agricultural products, as well as information on the amount, expenses, sales, values, and production of machinery. Most of the basic crop output data apply to the previous harvest year. Data collected also included the population and value of livestock, the number of animals slaughtered, and the size, type, and value of farms. Part 46 of this collection contains data from 1980 through 2010. Variables in part 46 include information such as the average value of farmland, number and value of buildings per acre, food services, resident population, composition of households, and unemployment rates. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Response Rates: Not applicable. Datasets:DS0: Study-Level FilesDS1: Farm Land Value Data Set (County and State) 1850-1959DS2: 1840 County and StateDS3: 1850 County and StateDS4: 1860 County and StateDS5: 1870 County and StateDS6: 1880 County and StateDS7: 1890 County and StateDS8: 1900 County and StateDS9: 1910 County and StateDS10: 1920 County and State, Dataset 1DS11: 1920 County and State, Dataset 2DS12: 1925 County and StateDS13: 1930 County and State, Dataset 1DS14: 1930 County and State, Dataset 2DS15: 1935 County and StateDS16: 1940 County and State, Dataset 1DS17: 1940 County and State, Dataset 2DS18: 1940 County and State, Dataset 3DS19: 1940 County and State, Dataset 4 (Water)DS20: 1945 County and StateDS21: 1950 County and State, Dataset 1DS22: 1950 Crops, County and State, Dataset 2DS23: 1950 County, Dataset 3DS24: 1950 County and State, Dataset 4DS25: 1954 County and State, Dataset 1DS26: 1954 Crops, County and State, Dataset 2DS27: 1959 County and State, Dataset 1DS28: 1959 Crops, County and State, Dataset 2DS29: 1959 County, Dataset 3DS30: 1964 Dataset 1DS31: 1964 Crops, County and State, Dataset 2DS32: 1964 County, Dataset 3DS33: 1969 All Farms, County and State, Dataset 1DS34: 1969 Farms 2500, County and State, Dataset 2DS35: 1969 Crops, County and State, Dataset 3DS36: 1974 All Farms, County and State, Dataset 1DS37: 1974 Farms 2500, County and State, Dataset 2DS38: 1974 Crops, County and State, Dataset 3DS39: 1978 County and StateDS40: 1982 County and StateDS41: 1987 County and StateDS42: 1992 County and StateDS43: 1997 County and StateDS44: 2002 County and StateDS45: 2007 County and StateDS46: State and County Data, United States, 1980-2010DS47: 2012 County and State Farms within United States counties and states. Smallest Geographic Unit: FIPS code The sample was the universe of agricultural operating units. For 1969-2007, data were taken from computer files from the Census Bureau and the United States Department of Agriculture. 2018-08-20 The P.I. resupplied data and documentation for 1935 County and State (dataset 15) and 1997 County and State (dataset 43). Additionally, documentation updates and variable label revisions have been incorporated in datasets 22, 26, 28, 31, 35, and 38 at the request of the P.I.2016-06-29 The data and documentation for 2012 County and State (data set 47) have been added to this collection. The collection and documentation titles have been updated to reflect the new year.2015-08-05 The data, setup files, and documentation for 1964 Dataset 1 have been updated to reflect changes from the producer. Funding insitution(s): National Science Foundation (NSF-SES-0921732; 0648045). United States Department of Health and Human Services. National Institutes of Health (R01 HD057929).
In the past four centuries, the population of the Thirteen Colonies and United States of America has grown from a recorded 350 people around the Jamestown colony in Virginia in 1610, to an estimated 346 million in 2025. While the fertility rate has now dropped well below replacement level, and the population is on track to go into a natural decline in the 2040s, projected high net immigration rates mean the population will continue growing well into the next century, crossing the 400 million mark in the 2070s. Indigenous population Early population figures for the Thirteen Colonies and United States come with certain caveats. Official records excluded the indigenous population, and they generally remained excluded until the late 1800s. In 1500, in the first decade of European colonization of the Americas, the native population living within the modern U.S. borders was believed to be around 1.9 million people. The spread of Old World diseases, such as smallpox, measles, and influenza, to biologically defenseless populations in the New World then wreaked havoc across the continent, often wiping out large portions of the population in areas that had not yet made contact with Europeans. By the time of Jamestown's founding in 1607, it is believed the native population within current U.S. borders had dropped by almost 60 percent. As the U.S. expanded, indigenous populations were largely still excluded from population figures as they were driven westward, however taxpaying Natives were included in the census from 1870 to 1890, before all were included thereafter. It should be noted that estimates for indigenous populations in the Americas vary significantly by source and time period. Migration and expansion fuels population growth The arrival of European settlers and African slaves was the key driver of population growth in North America in the 17th century. Settlers from Britain were the dominant group in the Thirteen Colonies, before settlers from elsewhere in Europe, particularly Germany and Ireland, made a large impact in the mid-19th century. By the end of the 19th century, improvements in transport technology and increasing economic opportunities saw migration to the United States increase further, particularly from southern and Eastern Europe, and in the first decade of the 1900s the number of migrants to the U.S. exceeded one million people in some years. It is also estimated that almost 400,000 African slaves were transported directly across the Atlantic to mainland North America between 1500 and 1866 (although the importation of slaves was abolished in 1808). Blacks made up a much larger share of the population before slavery's abolition. Twentieth and twenty-first century The U.S. population has grown steadily since 1900, reaching one hundred million in the 1910s, two hundred million in the 1960s, and three hundred million in 2007. Since WWII, the U.S. has established itself as the world's foremost superpower, with the world's largest economy, and most powerful military. This growth in prosperity has been accompanied by increases in living standards, particularly through medical advances, infrastructure improvements, clean water accessibility. These have all contributed to higher infant and child survival rates, as well as an increase in life expectancy (doubling from roughly 40 to 80 years in the past 150 years), which have also played a large part in population growth. As fertility rates decline and increases in life expectancy slows, migration remains the largest factor in population growth. Since the 1960s, Latin America has now become the most common origin for migrants in the U.S., while immigration rates from Asia have also increased significantly. It remains to be seen how immigration restrictions of the current administration affect long-term population projections for the United States.
"The following document is Volume 3 of the report of investigations conducted at Bush Hill plantation (site 38AK660). Bush Hill plantation is located near Upper Three Runs Creek in Aiken County, South Carolina on the Savannah River Site, a nuclear research facility operated by the U.S. Department of Energy. Data recovery excavations were conducted at the site between 1996 and 1999 in response to the development of the Three Rivers Regional Landfill and Technology Center. Occupied between circa 1807 and 1920, the site, containing the archaeological remains of a planter’s dwelling and houselot, was owned by three generations of the George Bush family. The residents of Bush Hill plantation raised livestock and produced numerous subsistence crops as well as cotton. To date, Bush Hill is the only antebellum plantation on the Savannah River Site, or in the surrounding middle Savannah River valley, that has been the subject of data recovery excavations. Consequently, the archaeology conducted at the site is significant because it provides important information regarding the material conditions experienced by a 19th-century planter household in the region. This volume contains the artifact inventory as well as the data gathered from the population census, agricultural census and probate documents."
In 1844, Romania had a population of just 3.6 million people. During the early entries in this data, Romania's borders were very different and much smaller than today, and control of this area often switched hands between the Austrian, Ottoman and Russian empires. The populations during this time are based on estimates made for incomplete census data, and they show that the population grows from 3.6 million in 1844, doubling to 7.2 million in 1912, part of this growth is due to a high natural birth rate during this period, but also partly due to the changing of Romania's borders and annexation of new lands. During this time Romania gained its independence from the Ottoman Empire as a result of the Russo-Turkish War in 1878, and experienced a period of increased stability and progress.
Between 1912 and 1930 the population of Romania grew by over 10 million people. The main reason for this is the huge territories gained by Romania in the aftermath of the First World War. During the war Romania remained neutral for the first two years, after which it joined the allies; however, it was very quickly defeated and overrun by the Central Powers, and in total it lost over 600 thousand people as a direct result of the war. With the collapse of the Austro-Hungarian and Russian empires after the war, Romania gained almost double it's territory, which caused the population to soar to 18.1 million in 1930. The population then decreases by 1941 and again by 1948, as Romania seceded territory to neighboring countries and lost approximately half a million people during the Second World War. From 1948 onwards the population begins to grow again, reaching it's peak at 23.5 million people in 1990.
Like many other Eastern European countries, there was very limited freedom of movement from Romania during the Cold War, and communist rule was difficult for the Romanian people. The Romanian Revolution in 1989 ended communist rule in the country, Romania transitioned to a free-market society and movement from the country was allowed. Since then the population has fallen each year as more and more Romanians move abroad in search of work and opportunities. The population is expected to fall to 19.2 million in 2020, which is over 4 million fewer people than it had in 1990.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provides the population of Japan as collected by the official Japanese government from 1920 to 2015. It is given by year, prefecture, age range, and gender.
Can the data be used to answer questions such as the following?
The following script written by the dataset owner was used:
import pandas as pd
import numpy as np
import re
japan_census = pd.read_csv('~/Downloads/c03.csv', encoding = 'SJIS')
# Eliminate a note
japan_census = japan_census.iloc[:-1]
# Eliminate the sums across prefectures
japan_census = japan_census[japan_census['年齢5歳階級'] != '総数']
def prefecture(japanese):
return {
'北海道': 'Hokkaido',
'青森県': 'Aomori Prefecture',
'岩手県': 'Iwate Prefecture',
'宮城県': 'Miyagi Prefecture',
'秋田県': 'Akita Prefecture',
'山形県': 'Yamagata Prefecture',
'福島県': 'Fukushima Prefecture',
'茨城県': 'Ibaraki Prefecture',
'栃木県': 'Tochigi Prefecture',
'群馬県': 'Gunma Prefecture',
'埼玉県': 'Saitama Prefecture',
'千葉県': 'Chiba Prefecture',
'東京都': 'Tokyo Metropolis',
'神奈川県': 'Kanagawa Prefecture',
'新潟県': 'Niigata Prefecture',
'富山県': 'Toyama Prefecture',
'石川県': 'Ishikawa Prefecture',
'福井県': 'Fukui Prefecture',
'山梨県': 'Yamanashi Prefecture',
'長野県': 'Nagano Prefecture',
'岐阜県': 'Gifu Prefecture',
'静岡県': 'Shizuoka Prefecture',
'愛知県': 'Aichi Prefecture',
'三重県': 'Mie Prefecture',
'滋賀県': 'Shiga Prefecture',
'京都府': 'Kyoto Prefecture',
'大阪府': 'Osaka Prefecture',
'兵庫県': 'Hyogo Prefecture',
'奈良県': 'Nara Prefecture',
'和歌山県': 'Wakayama Prefecture',
'鳥取県': 'Tottori Prefecture',
'島根県': 'Shimane Prefecture',
'岡山県': 'Okayama Prefecture',
'広島県': 'Hiroshima Prefecture',
'山口県': 'Yamaguchi Prefecture',
'徳島県': 'Tokushima Prefecture',
'香川県': 'Kagawa Prefecture',
'愛媛県': 'Ehime Prefecture',
'高知県': 'Kochi Prefecture',
'福岡県': 'Fukui Prefecture',
'佐賀県': 'Saga Prefecture',
'長崎県': 'Nagasaki Prefecture',
'熊本県': 'Kumamoto Prefecture',
'大分県': 'Oita Prefecture',
'宮崎県': 'Miyazaki Prefecture',
'鹿児島県': 'Kagoshima Prefecture',
'沖縄県': 'Okinawa Prefecture',
}.get(japanese)
japan_census_translated = pd.DataFrame()
japan_census_translated['Year'] = japan_census['西暦(年)'].astype('int')
japan_census_translated['Prefecture'] = japan_census['都道府県名'].map(lambda x: prefecture(x))
japan_census_translated[['Age Lower Bound', 'Age Upper Bound']] = [
[m.group(1), m.group(2)] for m in japan_census['年齢5歳階級'].map(lambda x: re.search('(\d+)\D+(\d+)?', x))
]
japan_census_translated = pd.DataFrame(
np.repeat(japan_census_translated.values, 2, axis = 0),
columns = japan_census_translated.columns
)
japan_census_translated[['Gender', 'Population']] = [
x for _, row in japan_census.iterrows() for x in [
['Male', int(row.loc['人口(男)'])],
['Female', int(row.loc['人口(女)'])],
]
]
print(japan_census_translated)
japan_census_translated.to_csv('japanese_census.csv')
The world's Jewish population has had a complex and tumultuous history over the past millennia, regularly dealing with persecution, pogroms, and even genocide. The legacy of expulsion and persecution of Jews, including bans on land ownership, meant that Jewish communities disproportionately lived in urban areas, working as artisans or traders, and often lived in their own settlements separate to the rest of the urban population. This separation contributed to the impression that events such as pandemics, famines, or economic shocks did not affect Jews as much as other populations, and such factors came to form the basis of the mistrust and stereotypes of wealth (characterized as greed) that have made up anti-Semitic rhetoric for centuries. Development since the Middle Ages The concentration of Jewish populations across the world has shifted across different centuries. In the Middle Ages, the largest Jewish populations were found in Palestine and the wider Levant region, with other sizeable populations in present-day France, Italy, and Spain. Later, however, the Jewish disapora became increasingly concentrated in Eastern Europe after waves of pogroms in the west saw Jewish communities move eastward. Poland in particular was often considered a refuge for Jews from the late-Middle Ages until the 18th century, when it was then partitioned between Austria, Prussia, and Russia, and persecution increased. Push factors such as major pogroms in the Russian Empire in the 19th century and growing oppression in the west during the interwar period then saw many Jews migrate to the United States in search of opportunity.
This table contains 13 series, with data for years 1926 - 1960 (not all combinations necessarily have data for all years), and was last released on 2000-02-18. This table contains data described by the following dimensions (Not all combinations are available): Geography (13 items: Canada; Newfoundland and Labrador; Prince Edward Island; Nova Scotia ...).
In 1800, the population of the area of modern-day Bangladesh was estimated to be just over 19 million, a figure which would rise steadily throughout the 19th century, reaching over 26 million by 1900. At the time, Bangladesh was the eastern part of the Bengal region in the British Raj, and had the most-concentrated Muslim population in the subcontinent's east. At the turn of the 20th century, the British colonial administration believed that east Bengal was economically lagging behind the west, and Bengal was partitioned in 1905 as a means of improving the region's development. East Bengal then became the only Muslim-majority state in the eastern Raj, which led to socioeconomic tensions between the Hindu upper classes and the general population. Bengal Famine During the Second World War, over 2.5 million men from across the British Raj enlisted in the British Army and their involvement was fundamental to the war effort. The war, however, had devastating consequences for the Bengal region, as the famine of 1943-1944 resulted in the deaths of up to three million people (with over two thirds thought to have been in the east) due to starvation and malnutrition-related disease. As the population boomed in the 1930s, East Bengal's mismanaged and underdeveloped agricultural sector could not sustain this growth; by 1942, food shortages spread across the region, millions began migrating in search of food and work, and colonial mismanagement exacerbated this further. On the brink of famine in early-1943, authorities in India called for aid and permission to redirect their own resources from the war effort to combat the famine, however these were mostly rejected by authorities in London. While the exact extent of each of these factors on causing the famine remains a topic of debate, the general consensus is that the British War Cabinet's refusal to send food or aid was the most decisive. Food shortages did not dissipate until late 1943, however famine deaths persisted for another year. Partition to independence Following the war, the movement for Indian independence reached its final stages as the process of British decolonization began. Unrest between the Raj's Muslim and Hindu populations led to the creation of two separate states in1947; the Muslim-majority regions became East Pakistan (now Bangladesh) and West Pakistan (now Pakistan), separated by the Hindu-majority India. Although East Pakistan's population was larger, power lay with the military in the west, and authorities grew increasingly suppressive and neglectful of the eastern province in the following years. This reached a tipping point when authorities failed to respond adequately to the Bhola cyclone in 1970, which claimed over half a million lives in the Bengal region, and again when they failed to respect the results of the 1970 election, in which the Bengal party Awami League won the majority of seats. Bangladeshi independence was claimed the following March, leading to a brutal war between East and West Pakistan that claimed between 1.5 and three million deaths in just nine months. The war also saw over half of the country displaced, widespread atrocities, and the systematic rape of hundreds of thousands of women. As the war spilled over into India, their forces joined on the side of Bangladesh, and Pakistan was defeated two weeks later. An additional famine in 1974 claimed the lives of several hundred thousand people, meaning that the early 1970s was one of the most devastating periods in the country's history. Independent Bangladesh In the first decades of independence, Bangladesh's political hierarchy was particularly unstable and two of its presidents were assassinated in military coups. Since transitioning to parliamentary democracy in the 1990s, things have become comparatively stable, although political turmoil, violence, and corruption are persistent challenges. As Bangladesh continues to modernize and industrialize, living standards have increased and individual wealth has risen. Service industries have emerged to facilitate the demands of Bangladesh's developing economy, while manufacturing industries, particularly textiles, remain strong. Declining fertility rates have seen natural population growth fall in recent years, although the influx of Myanmar's Rohingya population due to the displacement crisis has seen upwards of one million refugees arrive in the country since 2017. In 2020, it is estimated that Bangladesh has a population of approximately 165 million people.
In 1800, the population of the region of present-day India was approximately 169 million. The population would grow gradually throughout the 19th century, rising to over 240 million by 1900. Population growth would begin to increase in the 1920s, as a result of falling mortality rates, due to improvements in health, sanitation and infrastructure. However, the population of India would see it’s largest rate of growth in the years following the country’s independence from the British Empire in 1948, where the population would rise from 358 million to over one billion by the turn of the century, making India the second country to pass the billion person milestone. While the rate of growth has slowed somewhat as India begins a demographics shift, the country’s population has continued to grow dramatically throughout the 21st century, and in 2020, India is estimated to have a population of just under 1.4 billion, well over a billion more people than one century previously. Today, approximately 18% of the Earth’s population lives in India, and it is estimated that India will overtake China to become the most populous country in the world within the next five years.
【対象期間】明治31, 36, 41, 大正2, 7年末, 大正9年10月1日, 大正14年10月1日【注】市町村制施行地の連合町村は連合内の各町村を別々に計上せり。但明治三十一年及同三十六年の調査に於て鹿児島県大島郡の連合村は連合内の各村別人口の調査なきを以て一連合村を一村として計上せり。大正九年、大正十四年十月一日の計数は国勢調査の結果にして現在人口なり。【計数出所】内閣統計局調査 / PERIOD: At the end of every 5 years from 1898 to 1918. As of October 1st every 5 years from 1920 to 1925. NOTE: Individual towns and villages that were combined under the Municipal Government Act are counted separately. However, in the surveys in 1898 and 1903, the combined villages of Oshima County in Kagoshima Prefecture were not surveyed individually and are recorded as one combined village. The population figures for October 1st, 1920 and 1925 are from the Population Censuses conducted in those years, while the figures are the de facto population. SOURCE: [Survey by the Statistics Bureau, Imperial Cabinet]. / 公的統計: 集計データ、統計表 / official statistics: aggregate data / 集計 / Aggregation / Keywords: 人口センサス, 統計, 経済, Statistics, Economics, Censuses, 人口, Population【リソース】Fulltext
【対象期間】明治31, 36, 41, 大正2, 7年末, 大正9年10月1日, 大正14年10月1日, 昭和5年10月1日, 昭和10年10月1日【注】市町村制施行地の連合町村は連合内の各町村を別々に計上せり。但明治三十一年及同三十六年の調査に於て鹿児島県大島郡の連合村は連合内の各村別人口の調査なきを以て一連合村を一村として計上せり。明治三十一年末乃至大正二年末各欄の人口は甲種現在人口(第三十五回日本帝国統計年鑑第十七頁参照)にして、大正九年、同十四年、昭和五年並同十年の各数は国勢調査の結果なり。【計数 / PERIOD: At the end of every 5 years from 1898 to 1918. As of October 1st every 5 years from 1920 to 1935. NOTE: Individual towns and villages that were combined under the Municipal Government Act are counted separately. However, in the surveys in 1898 and 1903, the combined villages of Oshima County in Kagoshima Prefecture were not surveyed individually and are recorded as one combined village. The population figures in the columns for 1898 year-end to 1913 year-end are the "type A" de facto population (see page 17 of the 1916 Statistical Yearbook of Imperial Japan), while the figures for 1920, 1925, 1930, and 1935 are from the Population Censuses conducted in those years. . SOURCE: [Survey by the Statistics Bureau, Imperial Cabinet]. / 公的統計: 集計データ、統計表 / official statistics: aggregate data / 集計 / Aggregation / Keywords: 人口センサス, 統計, 経済, Statistics, Economics, Censuses, 人口, Population【リソース】Fulltext
【対象期間】明治31, 36, 41, 大正2, 7年末, 大正9年10月1日, 大正14年10月1日, 昭和5年10月1日【注】市町村制施行地の連合町村は連合内の各町村を別々に計上せり。但明治三十一年及同三十六年の調査に於て鹿児島県大島郡の連合村は連合内の各村別人口の調査なきを以て一連合村を一村として計上せり。明治三十一年末乃至大正二年末各欄の人口は甲種現在人口(第三十五回日本帝国統計年鑑第十七頁参照)にして、大正九年、同十四年並昭和五年の各数は国勢調査の結果なり。【計数出所】内閣統計局調査 / PERIOD: At the end of every 5 years from 1898 to 1918. As of October 1st every 5 years from 1920 to 1930. NOTE: Individual towns and villages that were combined under the Municipal Government Act are counted separately. However, in the surveys in 1898 and 1903, the combined villages of Oshima County in Kagoshima Prefecture were not surveyed individually and are recorded as one combined village. The population figures in the columns for 1898 year-end to 1913 year-end are the "type A" de facto population (see page 17 of the 1916 Statistical Yearbook of Imperial Japan), while the figures for 1920, 1925, and 1930 are from the Population Censuses conducted in those years. . SOURCE: [Survey by the Statistics Bureau, Imperial Cabinet]. / 公的統計: 集計データ、統計表 / official statistics: aggregate data / 集計 / Aggregation / Keywords: 人口センサス, 統計, 経済, Statistics, Economics, Censuses, 人口, Population【リソース】Fulltext
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.
Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.
Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.
%3C!-- --%3E
This dataset was created on 2020-01-10 18:46:34.647
by merging multiple datasets together. The source datasets for this version were:
IPUMS 1920 households: This dataset includes all households from the 1920 US census.
IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.
IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.