11 datasets found

Historic US Census - 1920
redivis.com
application/jsonl +7
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Historic US Census - 1920 [Dataset]. http://doi.org/10.57761/v43s-pk48
Explore at:
sas, csv, spss, stata, application/jsonl, arrow, avro, parquetAvailable download formats
Unique identifier
https://doi.org/10.57761/v43s-pk48
Dataset updated
Jan 10, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 1920 - Dec 31, 1920
Area covered
United States
Description
Abstract

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Notes

We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.

Coded variables derived from string variables are still in progress. These variables include: occupation and industry.

Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

%3C!-- --%3E

Section 2

This dataset was created on 2020-01-10 18:46:34.647 by merging multiple datasets together. The source datasets for this version were:

IPUMS 1920 households: This dataset includes all households from the 1920 US census.

IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.

IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.
r
Persons
redivis.com
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Persons [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
Explore at:
Dataset updated
Jan 10, 2020
Dataset authored and provided by
Stanford Center for Population Health Sciences
Time period covered
1920
Description
This dataset includes all individuals from the 1920 US census.
o
Census Tree Links
openicpsr.org
Updated Jul 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kasey Buckles; Joseph Price (2021). Census Tree Links [Dataset]. http://doi.org/10.3886/E144904V1
Explore at:
Unique identifier
https://doi.org/10.3886/E144904V1
Dataset updated
Jul 12, 2021
Dataset provided by
University of Notre Dame
Brigham Young University
Authors
Kasey Buckles; Joseph Price
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1900 - 1920
Area covered
United States
Description
The data sets in this repository allow users to link people among the U.S. decennial censuses, using the "histid" identifier. The census data sets users will need are indexed by Ancestry.com and are hosted by IPUMS at https://usa.ipums.org/usa-action/samples. Users will need to download the full-count census for each year and be sure to select the "histid" variable that is available under the Person/Historical Technical drop-down menu.As of 7/12/21, links are available between the 1900-1910, 1910-1920, and 1900-1920 censuses.A detailed account of how these links are created and a description of the data and its characteristics are available in the following article:Price, J., Buckles, K., Van Leeuwen, J., & Riley, I. (2021). Combining family history and machine learning to link historical records: The Census Tree data set. Explorations in Economic History, 80, 101391.https://www.sciencedirect.com/science/article/pii/S0014498321000024
f
Description of sources of names and ancestry data.
figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Monasterio (2023). Description of sources of names and ancestry data. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0176890.t001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Leonardo Monasterio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of sources of names and ancestry data.
r
Households
redivis.com
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Households [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
Explore at:
Dataset updated
Jan 10, 2020
Dataset authored and provided by
Stanford Center for Population Health Sciences
Time period covered
1920
Description
This dataset includes all households from the 1920 US census.
f
Distribution of surname-ancestry in the reference data.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Monasterio (2023). Distribution of surname-ancestry in the reference data. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0176890.t002
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Leonardo Monasterio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Distribution of surname-ancestry in the reference data.
f
Surname ancestry estimated according to the last or unique surname.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Monasterio (2023). Surname ancestry estimated according to the last or unique surname. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0176890.t005
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Leonardo Monasterio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Surname ancestry estimated according to the last or unique surname.
r
Lookup
redivis.com
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Lookup [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
Explore at:
Dataset updated
Jan 10, 2020
Dataset authored and provided by
Stanford Center for Population Health Sciences
Description
This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.
f
Confusion matrix—observed and predicted values obtained using the Cavnar and...
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Monasterio (2023). Confusion matrix—observed and predicted values obtained using the Cavnar and Trenkle procedure for the classification of surnames in the test set. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0176890.t003
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Leonardo Monasterio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Confusion matrix—observed and predicted values obtained using the Cavnar and Trenkle procedure for the classification of surnames in the test set.
a
State College Sanborn with Census & Student Directory, 1930
mapsgislib-pennstate.hub.arcgis.com
Updated Aug 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hdr10psu (2017). State College Sanborn with Census & Student Directory, 1930 [Dataset]. https://mapsgislib-pennstate.hub.arcgis.com/items/cef1eabdd3a543a2bb0ac6d57979a604
Explore at:
Dataset updated
Aug 23, 2017
Dataset authored and provided by
hdr10psu
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
State College
Description
Data Sources- Scanned copies of the U.S. Census for various years (including 1920 and 1930) available from Ancestry Library Edition database.- Sanborn shapefiles were created by Bednar student interns at Penn State's Pattee Library. They are based on the collection of PA Sanborns housed at the same library.
f
Hourly wages in Brazil (2013)- Analysis of variances.
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Monasterio (2023). Hourly wages in Brazil (2013)- Analysis of variances. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0176890.t006
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Leonardo Monasterio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
Hourly wages in Brazil (2013)- Analysis of variances.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stanford Center for Population Health Sciences (2020). Historic US Census - 1920 [Dataset]. http://doi.org/10.57761/v43s-pk48

Historic US Census - 1920

Explore at:

sas, csv, spss, stata, application/jsonl, arrow, avro, parquetAvailable download formats

Unique identifier

https://doi.org/10.57761/v43s-pk48

Dataset updated

Jan 10, 2020

Dataset provided by

Redivis Inc.

Authors

Stanford Center for Population Health Sciences

Time period covered

Jan 1, 1920 - Dec 31, 1920

Area covered

United States

Description

Abstract

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Notes

We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.
Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.
Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

%3C!-- --%3E

Section 2

This dataset was created on 2020-01-10 18:46:34.647 by merging multiple datasets together. The source datasets for this version were:

IPUMS 1920 households: This dataset includes all households from the 1920 US census.

IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.

IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.

Clear search

Close search

Google apps

Main menu

Historic US Census - 1920

Abstract

Before Manuscript Submission

Documentation

Section 2

Persons

Census Tree Links

Description of sources of names and ancestry data.

Households

Distribution of surname-ancestry in the reference data.

Surname ancestry estimated according to the last or unique surname.

Lookup

Confusion matrix—observed and predicted values obtained using the Cavnar and...

State College Sanborn with Census & Student Directory, 1930

Hourly wages in Brazil (2013)- Analysis of variances.

Historic US Census - 1920See More Versions

Abstract

Before Manuscript Submission

Documentation

Section 2

Historic US Census - 1920