13 datasets found
  1. Historic US Census - 1920

    • redivis.com
    application/jsonl +7
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Historic US Census - 1920 [Dataset]. http://doi.org/10.57761/v43s-pk48
    Explore at:
    sas, csv, spss, stata, application/jsonl, arrow, avro, parquetAvailable download formats
    Dataset updated
    Jan 10, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Time period covered
    Jan 1, 1920 - Dec 31, 1920
    Area covered
    United States
    Description

    Abstract

    The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

    Before Manuscript Submission

    All manuscripts (and other items you'd like to publish) must be submitted to

    phsdatacore@stanford.edu for approval prior to journal submission.

    We will check your cell sizes and citations.

    For more information about how to cite PHS and PHS datasets, please visit:

    https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

    Documentation

    Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

    In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

    The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

    Notes

    • We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

    • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.

    • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.

    • Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

    • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

    %3C!-- --%3E

    Section 2

    This dataset was created on 2020-01-10 18:46:34.647 by merging multiple datasets together. The source datasets for this version were:

    IPUMS 1920 households: This dataset includes all households from the 1920 US census.

    IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.

    IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.

  2. o

    Census Tree Links

    • openicpsr.org
    Updated Jul 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kasey Buckles; Joseph Price (2021). Census Tree Links [Dataset]. http://doi.org/10.3886/E144904V1
    Explore at:
    Dataset updated
    Jul 12, 2021
    Dataset provided by
    Brigham Young University
    University of Notre Dame
    Authors
    Kasey Buckles; Joseph Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1900 - 1920
    Area covered
    United States
    Description

    The data sets in this repository allow users to link people among the U.S. decennial censuses, using the "histid" identifier. The census data sets users will need are indexed by Ancestry.com and are hosted by IPUMS at https://usa.ipums.org/usa-action/samples. Users will need to download the full-count census for each year and be sure to select the "histid" variable that is available under the Person/Historical Technical drop-down menu.As of 7/12/21, links are available between the 1900-1910, 1910-1920, and 1900-1920 censuses.A detailed account of how these links are created and a description of the data and its characteristics are available in the following article:Price, J., Buckles, K., Van Leeuwen, J., & Riley, I. (2021). Combining family history and machine learning to link historical records: The Census Tree data set. Explorations in Economic History, 80, 101391.https://www.sciencedirect.com/science/article/pii/S0014498321000024

  3. r

    Persons

    • redivis.com
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Persons [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Stanford Center for Population Health Sciences
    Time period covered
    1920
    Description

    This dataset includes all individuals from the 1920 US census.

  4. f

    Surnames and ancestry in Brazil

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surnames and ancestry in Brazil [Dataset]. https://plos.figshare.com/articles/dataset/Surnames_and_ancestry_in_Brazil/4984046
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Leonardo Monasterio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    This paper presents a method for classifying the ancestry of Brazilian surnames based on historical sources. The information obtained forms the basis for applying fuzzy matching and machine learning classification algorithms to more than 46 million workers in 5 categories: Iberian, Italian, Japanese, German and East European. The vast majority (96.7%) of the single surnames were identified using a fuzzy matching and the rest using a method proposed by Cavnar and Trenkle (1994). A comparison of the results of the procedures with data on foreigners in the 1920 Census and with the geographic distribution of non-Iberian surnames underscores the accuracy of the procedure. The study shows that surname ancestry is associated with significant differences in wages and schooling.

  5. r

    Households

    • redivis.com
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Households [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Stanford Center for Population Health Sciences
    Time period covered
    1920
    Description

    This dataset includes all households from the 1920 US census.

  6. r

    Lookup

    • redivis.com
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Lookup [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Stanford Center for Population Health Sciences
    Description

    This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.

  7. f

    Distribution of surname-ancestry in the reference data.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Monasterio (2023). Distribution of surname-ancestry in the reference data. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Leonardo Monasterio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Distribution of surname-ancestry in the reference data.

  8. f

    Surname ancestry estimated according to the last or unique surname.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Monasterio (2023). Surname ancestry estimated according to the last or unique surname. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Leonardo Monasterio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Surname ancestry estimated according to the last or unique surname.

  9. f

    Confusion matrix—observed and predicted values obtained using the Naïve...

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Monasterio (2023). Confusion matrix—observed and predicted values obtained using the Naïve Bayes procedure for the classification of surnames in the test set. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Leonardo Monasterio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Confusion matrix—observed and predicted values obtained using the Naïve Bayes procedure for the classification of surnames in the test set.

  10. h

    Household Heads, Family Members, Occupational Servants, and Domestic...

    • d-repo.ier.hit-u.ac.jp
    application/x-yaml +3
    Updated Nov 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    内閣統計局 (2021). Household Heads, Family Members, Occupational Servants, and Domestic Servants (Population census on Oct. 1, 1920 ) : Statistical Yearbook of Imperial Japan 52 (1933) Table 11 [Dataset]. https://d-repo.ier.hit-u.ac.jp/records/2004270
    Explore at:
    txt, text/x-shellscript, application/x-yaml, pdfAvailable download formats
    Dataset updated
    Nov 18, 2021
    Authors
    内閣統計局
    Time period covered
    Oct 1, 1920
    Area covered
    Japan, 日本
    Description

    PERIOD: Population census on Oct. 1, 1920 . SOURCE: [Survey by the Statistics Bureau, Imperial Cabinet].

  11. a

    State College Sanborn + Census, 1930

    • mapsgislib-pennstate.hub.arcgis.com
    Updated Mar 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hdr10psu (2017). State College Sanborn + Census, 1930 [Dataset]. https://mapsgislib-pennstate.hub.arcgis.com/items/28d90117dc924942951389e2ca53db50
    Explore at:
    Dataset updated
    Mar 21, 2017
    Dataset authored and provided by
    hdr10psu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    State College
    Description

    This application displays the buildings in State College borough in 1930 as polygon features. The buildings are linked to a table with the contents of the 1930 Census of State College. Click on a building to bring up information about its physical features, such as building material or number of floors, as well as its address and associated land use. If the building contained residents listed on the Census, scroll down within the info box and click on the link below "Related Tables" to bring up a list of the residents. Clicking on a resident in the list will open that resident's entry in the Census table, which includes socioeconomic information such as their name, age, nationality, marital status, and occupation. Residents can also be searched for by name in the Query box that appears on the left side of the screen. Data Sources- Scanned copies of the U.S. Census for various years (including 1920 and 1930) available from Ancestry Library Edition database.- Sanborn shapefiles were created by Bednar student interns at Penn State's Pattee/Paterno Library. They are based on the collection of PA Sanborns housed in the Maps Collection at the library.

  12. a

    State College Sanborn with Census & Student Directory, 1930

    • mapsgislib-pennstate.hub.arcgis.com
    Updated Aug 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hdr10psu (2017). State College Sanborn with Census & Student Directory, 1930 [Dataset]. https://mapsgislib-pennstate.hub.arcgis.com/items/cef1eabdd3a543a2bb0ac6d57979a604
    Explore at:
    Dataset updated
    Aug 23, 2017
    Dataset authored and provided by
    hdr10psu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data Sources- Scanned copies of the U.S. Census for various years (including 1920 and 1930) available from Ancestry Library Edition database.- Sanborn shapefiles were created by Bednar student interns at Penn State's Pattee Library. They are based on the collection of PA Sanborns housed at the same library.

  13. f

    Hourly wages in Brazil (2013)- Analysis of variances.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Monasterio (2023). Hourly wages in Brazil (2013)- Analysis of variances. [Dataset]. http://doi.org/10.1371/journal.pone.0176890.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Leonardo Monasterio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Hourly wages in Brazil (2013)- Analysis of variances.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford Center for Population Health Sciences (2020). Historic US Census - 1920 [Dataset]. http://doi.org/10.57761/v43s-pk48
Organization logo

Historic US Census - 1920

Explore at:
sas, csv, spss, stata, application/jsonl, arrow, avro, parquetAvailable download formats
Dataset updated
Jan 10, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 1920 - Dec 31, 1920
Area covered
United States
Description

Abstract

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Notes

  • We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.

  • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.

  • Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

%3C!-- --%3E

Section 2

This dataset was created on 2020-01-10 18:46:34.647 by merging multiple datasets together. The source datasets for this version were:

IPUMS 1920 households: This dataset includes all households from the 1920 US census.

IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.

IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.

Search
Clear search
Close search
Google apps
Main menu