24 datasets found
  1. Historic US Census - 1920

    • redivis.com
    application/jsonl +7
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Historic US Census - 1920 [Dataset]. http://doi.org/10.57761/v43s-pk48
    Explore at:
    sas, csv, spss, stata, application/jsonl, arrow, avro, parquetAvailable download formats
    Dataset updated
    Jan 10, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Time period covered
    Jan 1, 1920 - Dec 31, 1920
    Area covered
    United States
    Description

    Abstract

    The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

    Before Manuscript Submission

    All manuscripts (and other items you'd like to publish) must be submitted to

    phsdatacore@stanford.edu for approval prior to journal submission.

    We will check your cell sizes and citations.

    For more information about how to cite PHS and PHS datasets, please visit:

    https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

    Documentation

    Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

    In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

    The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

    Notes

    • We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

    • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.

    • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.

    • Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

    • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

    %3C!-- --%3E

    Section 2

    This dataset was created on 2020-01-10 18:46:34.647 by merging multiple datasets together. The source datasets for this version were:

    IPUMS 1920 households: This dataset includes all households from the 1920 US census.

    IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.

    IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.

  2. e

    1920 United States Federal Census

    • ebroy.org
    Updated 1920
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fourteenth Census of the United States, 1920. (NARA microfilm publication T625, 2076 rolls). Records of the Bureau of the Census, Record Group 29. National Archives, Washington, D.C. Year: 1920; Census Place: Philadelphia Ward 42, Philadelphia, Pennsylvania; Roll: T625_1643; Page: 13A; Enumeration District: 1564 (1920). 1920 United States Federal Census [Dataset]. https://ebroy.org/profile/?person=P14
    Explore at:
    Dataset updated
    1920
    Dataset authored and provided by
    Fourteenth Census of the United States, 1920. (NARA microfilm publication T625, 2076 rolls). Records of the Bureau of the Census, Record Group 29. National Archives, Washington, D.C. Year: 1920; Census Place: Philadelphia Ward 42, Philadelphia, Pennsylvania; Roll: T625_1643; Page: 13A; Enumeration District: 1564
    Area covered
    United States
    Description

    1920 United States Federal Census contains records from Philadelphia, Pennsylvania, USA by Fourteenth Census of the United States, 1920. (NARA microfilm publication T625, 2076 rolls). Records of the Bureau of the Census, Record Group 29. National Archives, Washington, D.C. Year: 1920; Census Place: Philadelphia Ward 42, Philadelphia, Pennsylvania; Roll: T625_1643; Page: 13A; Enumeration District: 1564 - .

  3. r

    Persons

    • redivis.com
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Persons [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Stanford Center for Population Health Sciences
    Time period covered
    1920
    Description

    This dataset includes all individuals from the 1920 US census.

  4. W

    Census Record 1920

    • cloud.csiss.gmu.edu
    • catalog.data.gov
    json, xls
    Updated Oct 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2020). Census Record 1920 [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/census-record-1920
    Explore at:
    xls, jsonAvailable download formats
    Dataset updated
    Oct 6, 2020
    Dataset provided by
    United States
    Description

    Historical record of Arlington population as captured by the 1920 census record.

  5. r

    Households

    • redivis.com
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Households [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Stanford Center for Population Health Sciences
    Time period covered
    1920
    Description

    This dataset includes all households from the 1920 US census.

  6. o

    The Census Tree, 1880-1920

    • openicpsr.org
    Updated Aug 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Price; Kasey Buckles; Adrian Haws; Haley Wilbert (2023). The Census Tree, 1880-1920 [Dataset]. http://doi.org/10.3886/E193249V1
    Explore at:
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Cornell University
    University of Notre Dame
    Brigham Young University
    Authors
    Joseph Price; Kasey Buckles; Adrian Haws; Haley Wilbert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1880 - 1920
    Area covered
    United States
    Description

    The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. These links allow researchers to construct a longitudinal dataset that is highly representative of the population, and that includes women, Black Americans, and other under-represented populations at unprecedented rates. Each .csv file consists of a crosswalk between the two years indicated in the filename, using the IPUMS histids. For more information, consult the included Read Me file, and visit https://censustree.org.

  7. e

    1920 United States Federal Census

    • ebroy.org
    Updated 1920
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Year: 1920; Census Place: Barre Ward 2, Washington, Vermont; Roll: T625_1875; Page: 7B; Enumeration District: 73 (1920). 1920 United States Federal Census [Dataset]. https://ebroy.org/profile/?person=P18
    Explore at:
    Dataset updated
    1920
    Dataset authored and provided by
    Year: 1920; Census Place: Barre Ward 2, Washington, Vermont; Roll: T625_1875; Page: 7B; Enumeration District: 73
    Area covered
    United States
    Description

    1920 United States Federal Census contains records from Barre, Washington, Vermont, USA by Year: 1920; Census Place: Barre Ward 2, Washington, Vermont; Roll: T625_1875; Page: 7B; Enumeration District: 73 - .

  8. e

    1920 Unite States Federal Census

    • ebroy.org
    Updated 1920
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Year: 1920; Census Place: Caribou, Aroostook, Maine; Roll: T625_638; Page: 27B; Enumeration District: 8 (1920). 1920 Unite States Federal Census [Dataset]. https://ebroy.org/profile/?person=P13
    Explore at:
    Dataset updated
    1920
    Dataset authored and provided by
    Year: 1920; Census Place: Caribou, Aroostook, Maine; Roll: T625_638; Page: 27B; Enumeration District: 8
    Description

    1920 Unite States Federal Census contains records from Caribou, Maine, USA by Year: 1920; Census Place: Caribou, Aroostook, Maine; Roll: T625_638; Page: 27B; Enumeration District: 8 - .

  9. e

    1920 United States Federal Census

    • ebroy.org
    Updated 1920
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Year: 1920; Census Place: Bloomfield, Essex, Vermont; Roll: T625_1870; Page: 5B; Enumeration District: 26 (1920). 1920 United States Federal Census [Dataset]. https://ebroy.org/profile/?person=P56
    Explore at:
    Dataset updated
    1920
    Dataset authored and provided by
    Year: 1920; Census Place: Bloomfield, Essex, Vermont; Roll: T625_1870; Page: 5B; Enumeration District: 26
    Area covered
    United States
    Description

    1920 United States Federal Census contains records from Bloomfield, Vermont, USA by Year: 1920; Census Place: Bloomfield, Essex, Vermont; Roll: T625_1870; Page: 5B; Enumeration District: 26 - .

  10. e

    1920 United States Federal Census

    • ebroy.org
    Updated 1920
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Year: 1920; Census Place: Philadelphia Ward 42, Philadelphia, Pennsylvania; Roll: T625_1643; Page: 1A; Enumeration District: 1586 (1920). 1920 United States Federal Census [Dataset]. https://ebroy.org/profile/?person=P40
    Explore at:
    Dataset updated
    1920
    Dataset authored and provided by
    Year: 1920; Census Place: Philadelphia Ward 42, Philadelphia, Pennsylvania; Roll: T625_1643; Page: 1A; Enumeration District: 1586
    Area covered
    United States
    Description

    1920 United States Federal Census contains records from Philadelphia, Pennsylvania, USA by Year: 1920; Census Place: Philadelphia Ward 42, Philadelphia, Pennsylvania; Roll: T625_1643; Page: 1A; Enumeration District: 1586 - .

  11. r

    Lookup

    • redivis.com
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Lookup [Dataset]. https://redivis.com/datasets/gsmz-24068kvny
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Stanford Center for Population Health Sciences
    Description

    This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.

  12. o

    Census Tree Links

    • openicpsr.org
    Updated Jul 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kasey Buckles; Joseph Price (2021). Census Tree Links [Dataset]. http://doi.org/10.3886/E144904V1
    Explore at:
    Dataset updated
    Jul 12, 2021
    Dataset provided by
    University of Notre Dame
    Brigham Young University
    Authors
    Kasey Buckles; Joseph Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1900 - 1920
    Area covered
    United States
    Description

    The data sets in this repository allow users to link people among the U.S. decennial censuses, using the "histid" identifier. The census data sets users will need are indexed by Ancestry.com and are hosted by IPUMS at https://usa.ipums.org/usa-action/samples. Users will need to download the full-count census for each year and be sure to select the "histid" variable that is available under the Person/Historical Technical drop-down menu.As of 7/12/21, links are available between the 1900-1910, 1910-1920, and 1900-1920 censuses.A detailed account of how these links are created and a description of the data and its characteristics are available in the following article:Price, J., Buckles, K., Van Leeuwen, J., & Riley, I. (2021). Combining family history and machine learning to link historical records: The Census Tree data set. Explorations in Economic History, 80, 101391.https://www.sciencedirect.com/science/article/pii/S0014498321000024

  13. o

    The Census Tree, 1900-1920

    • openicpsr.org
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Price; Kasey Buckles; Adrian Haws; Haley Wilbert (2023). The Census Tree, 1900-1920 [Dataset]. http://doi.org/10.3886/E193264V1
    Explore at:
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Cornell University
    University of Notre Dame
    Brigham Young University
    Authors
    Joseph Price; Kasey Buckles; Adrian Haws; Haley Wilbert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1900 - 1920
    Area covered
    United States
    Description

    The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. These links allow researchers to construct a longitudinal dataset that is highly representative of the population, and that includes women, Black Americans, and other under-represented populations at unprecedented rates. Each .csv file consists of a crosswalk between the two years indicated in the filename, using the IPUMS histids. For more information, consult the included Read Me file, and visit https://censustree.org.

  14. d

    Census Linking Project: 1900-1920 Crosswalk

    • search.dataone.org
    Updated Nov 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abramitzky, Ran; Boustan, Leah; Eriksson, Katherine; Rashid, Myera; Pérez, Santiago (2023). Census Linking Project: 1900-1920 Crosswalk [Dataset]. http://doi.org/10.7910/DVN/MKRVLK
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Abramitzky, Ran; Boustan, Leah; Eriksson, Katherine; Rashid, Myera; Pérez, Santiago
    Description

    This crosswalk consists of individuals matched between the 1900 and 1920 complete-count US Censuses. Within the crosswalk, users have the option to select the linking method with which these matches were created. This version of the crosswalk contains links made by the ABE-exact (conservative and standard) method, the ABE-NYSIIS (conservative and standard) method and the ABE-NYSIIS (conservative and standard) method where race is used as a matching variable. This crosswalk also includes Census Tree Links created by Joseph Price, Kasey Buckles and Mark Clement at the Brigham Young University (BYU) Record Linking Lab. More detail on these links can be found in the census_tree_links_BYU_readme. For any chosen method, users can merge into this crosswalk a wide set of individual- and household-level variables provided publicly by IPUMS, thereby creating a historical longitudinal dataset for analysis.

  15. f

    Table_1_Operationalizing racialized exposures in historical research on...

    • frontiersin.figshare.com
    docx
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Kaniecki; Nicole Louise Novak; Sarah Gao; Sioban Harlow; Alexandra Minna Stern (2023). Table_1_Operationalizing racialized exposures in historical research on anti-Asian racism and health: a comparison of two methods.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2023.983434.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Marie Kaniecki; Nicole Louise Novak; Sarah Gao; Sioban Harlow; Alexandra Minna Stern
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundAddressing contemporary anti-Asian racism and its impacts on health requires understanding its historical roots, including discriminatory restrictions on immigration, citizenship, and land ownership. Archival secondary data such as historical census records provide opportunities to quantitatively analyze structural dynamics that affect the health of Asian immigrants and Asian Americans. Census data overcome weaknesses of other data sources, such as small sample size and aggregation of Asian subgroups. This article explores the strengths and limitations of early twentieth-century census data for understanding Asian Americans and structural racism.MethodsWe used California census data from three decennial census spanning 1920–1940 to compare two criteria for identifying Asian Americans: census racial categories and Asian surname lists (Chinese, Indian, Japanese, Korean, and Filipino) that have been validated in contemporary population data. This paper examines the sensitivity and specificity of surname classification compared to census-designated “color or race” at the population level.ResultsSurname criteria were found to be highly specific, with each of the five surname lists having a specificity of over 99% for all three census years. The Chinese surname list had the highest sensitivity (ranging from 0.60–0.67 across census years), followed by the Indian (0.54–0.61) and Japanese (0.51–0.62) surname lists. Sensitivity was much lower for Korean (0.40–0.45) and Filipino (0.10–0.21) surnames. With the exception of Indian surnames, the sensitivity values of surname criteria were lower for the 1920–1940 census data than those reported for the 1990 census. The extent of the difference in sensitivity and trends across census years vary by subgroup.DiscussionSurname criteria may have lower sensitivity in detecting Asian subgroups in historical data as opposed to contemporary data as enumeration procedures for Asians have changed across time. We examine how the conflation of race, ethnicity, and nationality in the census could contribute to low sensitivity of surname classification compared to census-designated “color or race.” These results can guide decisions when operationalizing race in the context of specific research questions, thus promoting historical quantitative study of Asian American experiences. Furthermore, these results stress the need to situate measures of race and racism in their specific historical context.

  16. n

    Historic Census

    • demography.osbm.nc.gov
    • nc-state-demographer-ncosbm.opendatasoft.com
    csv, excel, geojson +1
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Historic Census [Dataset]. https://demography.osbm.nc.gov/explore/dataset/historic-census/
    Explore at:
    json, geojson, excel, csvAvailable download formats
    Dataset updated
    Feb 8, 2022
    Description

    Historical population as enumerated and corrected from 1790 through 2020. North Carolina was one of the 13 original States and by the time of the 1790 census had essentially its current boundaries. The Census is mandated by the United States Constitution and was first completed for 1790. The population has been counted every ten years hence, with some limitations. In 1790 census coverage included most of the State, except for areas in the west, parts of which were not enumerated until 1840. The population for 1810 includes Walton County, enumerated as part of Georgia although actually within North Carolina. Historical populations shown here reflect the population of the respective named county and not necessarily the population of the area of the county as it was defined for a particular census. County boundaries shown in maps reflect boundaries as defined in 2020. Historic boundaries for some counties may include additional geographic areas or may be smaller than the current geographic boundaries. Notes below list the county or counties with which the population of a currently defined county were enumerated historically (Current County: Population counted in). The current 100 counties have been in place since the 1920 Census, although some modifications to the county boundaries have occurred since that time. For historical county boundaries see: Atlas of Historical County Boundaries Project (newberry.org)County Notes: Note 1: Total for 1810 includes population (1,026) of Walton County, reported as a Georgia county but later determined to be situated in western North Carolina. Total for 1890 includes 2 Indians in prison, not reported by county. Note 2: Alexander: *Iredell, Burke, Wilkes. Note 3: Avery: *Caldwell, Mitchell, Watauga. Note 4: Buncombe: *Burke, Rutherford; see also note 22. Note 5: Caldwell: *Burke, Wilkes, Yancey. Note 6: Cleveland: *Rutherford, Lincoln. Note 7: Columbus: *Bladen, Brunswick. Note 8: Dare: *Tyrrell, Currituck, Hyde. Note 9: Hoke: *Cumberland, Robeson. Note 10: Jackson: *Macon, Haywood. Note 11: Lee: *Moore, Chatham. Note 12: Lenoir: *Dobbs (Greene); Craven. Note 13: McDowell: *Burke, Rutherford. Note 14: Madison: *Buncombe, Yancey. Note 15: Mitchell: *Yancey, Watauga. Note 16: Pamlico: *Craven, Beaufort. Note 17: Polk: *Rutherford, Henderson. Note 18: Swain: *Jackson, Macon. Note 19: Transylvania: *Henderson, Jackson. Note 20: Union: *Mecklenburg, Anson. Note 21: Vance: *Granville, Warren, Franklin. Note 22: Walton: Created in 1803 as a Georgia county and reported in 1810 as part of Georgia; abolished after a review of the State boundary determined that its area was located in North Carolina. By 1820 it was part of Buncombe County. Note 23: Watauga: *Ashe, Yancey, Wilkes; Burke. Note 24: Wilson: *Edgecombe, Nash, Wayne, Johnston. Note 25: Yancey: *Burke, Buncombe. Note 26: Alleghany: *Ashe. Note 27: Haywood: *Buncombe. Note 28: Henderson: *Buncombe. Note 29: Person: Caswell. Note 30: Clay: Cherokee. Note 31: Graham: Cherokee. Note 32: Harnett: Cumberland. Note 33: Macon: Haywood.

    Note 34: Catawba: Lincoln. Note 35: Gaston: Lincoln. Note 36: Cabarrus: Mecklenburg.
    Note 37: Stanly: Montgomery. Note 38: Pender: New Hanover. Note 39: Alamance: Orange.
    Note 40: Durham: Orange, Wake. Note 41: Scotland: Richmond. Note 42: Davidson: Rowan. Note 43: Davie: Rowan.Note 44: Forsyth: Stokes. Note 45: Yadkin: Surry.
    Note 46: Washington: Tyrrell.Note 47: Ashe: Wilkes. Part III. Population of Counties, Earliest Census to 1990The 1840 population of Person County, NC should be 9,790. The 1840 population of Perquimans County, NC should be 7,346.

  17. h

    ice-id

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Goncalo Carvalho (2025). ice-id [Dataset]. https://huggingface.co/datasets/goldpotatoes/ice-id
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Goncalo Carvalho
    Description

    ICE-ID Dataset

      Overview
    

    ICE-ID is a benchmark dataset of Icelandic census records (1703–1920) for longitudinal identity resolution. It includes cleaned tabular features and a temporal graph of person records across census waves.

      Files
    

    raw_data/ ├─ people.csv ├─ manntol_einstaklingar_new.csv ├─ parishes.csv, districts.csv, counties.csv

    artifacts/ ├─ row_labels.csv # row_id, person mapping ├─ rows_with_person.csv # linked subset of rows ├─… See the full description on the dataset page: https://huggingface.co/datasets/goldpotatoes/ice-id.

  18. o

    Deep Roots of Racial Inequalities in US Healthcare: The 1906 American...

    • portal.sds.ox.ac.uk
    txt
    Updated Dec 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Chrisinger (2023). Deep Roots of Racial Inequalities in US Healthcare: The 1906 American Medical Directory [Dataset]. http://doi.org/10.25446/oxford.24065709.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 5, 2023
    Dataset provided by
    University of Oxford
    Authors
    Benjamin Chrisinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This dataset comprises physician-level entries from the 1906 American Medical Directory, the first in a series of semi-annual directories of all practicing physicians published by the American Medical Association [1]. Physicians are consistently listed by city, county, and state. Most records also include details about the place and date of medical training. From 1906-1940, Directories also identified the race of black physicians [2].This dataset comprises physician entries for a subset of US states and the District of Columbia, including all of the South and several adjacent states (Alabama, Arkansas, Delaware, Florida, Georgia, Kansas, Kentucky, Louisiana, Maryland, Mississippi, Missouri, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, West Virginia). Records were extracted via manual double-entry by professional data management company [3], and place names were matched to latitude/longitude coordinates. The main source for geolocating physician entries was the US Census. Historical Census records were sourced from IPUMS National Historical Geographic Information System [4]. Additionally, a public database of historical US Post Office locations was used to match locations that could not be found using Census records [5]. Fuzzy matching algorithms were also used to match misspelled place or county names [6].The source of geocoding match is described in the “match.source” field (Type of spatial match (census_YEAR = match to NHGIS census place-county-state for given year; census_fuzzy_YEAR = matched to NHGIS place-county-state with fuzzy matching algorithm; dc = matched to centroid for Washington, DC; post_places = place-county-state matched to Blevins & Helbock's post office dataset; post_fuzzy = matched to post office dataset with fuzzy matching algorithm; post_simp = place/state matched to post office dataset; post_confimed_missing = post office dataset confirms place and county, but could not find coordinates; osm = matched using Open Street Map geocoder; hand-match = matched by research assistants reviewing web archival sources; unmatched/hand_match_missing = place coordinates could not be found). For records where place names could not be matched, but county names could, coordinates for county centroids were used. Overall, 40,964 records were matched to places (match.type=place_point) and 931 to county centroids ( match.type=county_centroid); 76 records could not be matched (match.type=NA).Most records include information about the physician’s medical training, including the year of graduation and a code linking to a school. A key to these codes is given on Directory pages 26-27, and at the beginning of each state’s section [1]. The OSM geocoder was used to assign coordinates to each school by its listed location. Straight-line distances between physicians’ place of training and practice were calculated using the sf package in R [7], and are given in the “school.dist.km” field. Additionally, the Directory identified a handful of schools that were “fraudulent” (school.fraudulent=1), and institutions set up to train black physicians (school.black=1).AMA identified black physicians in the directory with the signifier “(col.)” following the physician’s name (race.black=1). Additionally, a number of physicians attended schools identified by AMA as serving black students, but were not otherwise identified as black; thus an expanded racial identifier was generated to identify black physicians (race.black.prob=1), including physicians who attended these schools and those directly identified (race.black=1).Approximately 10% of dataset entries were audited by trained research assistants, in addition to 100% of black physician entries. These audits demonstrated a high degree of accuracy between the original Directory and extracted records. Still, given the complexity of matching across multiple archival sources, it is possible that some errors remain; any identified errors will be periodically rectified in the dataset, with a log kept of these updates.For further information about this dataset, or to report errors, please contact Dr Ben Chrisinger (Benjamin.Chrisinger@tufts.edu). Future updates to this dataset, including additional states and Directory years, will be posted here: https://dataverse.harvard.edu/dataverse/amd.References:1. American Medical Association, 1906. American Medical Directory. American Medical Association, Chicago. Retrieved from: https://catalog.hathitrust.org/Record/000543547.2. Baker, Robert B., Harriet A. Washington, Ololade Olakanmi, Todd L. Savitt, Elizabeth A. Jacobs, Eddie Hoover, and Matthew K. Wynia. "African American physicians and organized medicine, 1846-1968: origins of a racial divide." JAMA 300, no. 3 (2008): 306-313. doi:10.1001/jama.300.3.306.3. GABS Research Consult Limited Company, https://www.gabsrcl.com.4. Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 17.0 [GNIS, TIGER/Line & Census Maps for US Places and Counties: 1900, 1910, 1920, 1930, 1940, 1950; 1910_cPHA: ds37]. Minneapolis, MN: IPUMS. 2022. http://doi.org/10.18128/D050.V17.05. Blevins, Cameron; Helbock, Richard W., 2021, "US Post Offices", https://doi.org/10.7910/DVN/NUKCNA, Harvard Dataverse, V1, UNF:6:8ROmiI5/4qA8jHrt62PpyA== [fileUNF]6. fedmatch: Fast, Flexible, and User-Friendly Record Linkage Methods. https://cran.r-project.org/web/packages/fedmatch/index.html7. sf: Simple Features for R. https://cran.r-project.org/web/packages/sf/index.html

  19. Census of Agriculture, 2007 - United States Virgin Islands

    • microdata.fao.org
    Updated Nov 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Agriculture, National Agriculture Statistical Service (USDA/NASS) (2020). Census of Agriculture, 2007 - United States Virgin Islands [Dataset]. https://microdata.fao.org/index.php/catalog/1608
    Explore at:
    Dataset updated
    Nov 16, 2020
    Dataset provided by
    United States Department of Agriculturehttp://usda.gov/
    National Agricultural Statistics Servicehttp://www.nass.usda.gov/
    Authors
    United States Department of Agriculture, National Agriculture Statistical Service (USDA/NASS)
    Time period covered
    2007
    Area covered
    U.S. Virgin Islands
    Description

    Abstract

    For more than 150 years, the U.S. Department of Commerce, Bureau of the Census, conducted the census of agriculture. However, the 2002 Appropriations Act transferred the responsibility from the Bureau of the Census to the U.S. Department of Agriculture (USDA), National Agricultural Statistics Service (NASS). The 2007 Census of Agriculture for the U.S. Virgin Islands is the second census in the U.S. Virgin Islands conducted by NASS. The census of agriculture is taken to obtain agricultural statistics for each county, State (including territories and protectorates), and the Nation. The first U.S. agricultural census data were collected in 1840 as a part of the sixth decennial census. From 1840 to 1920, an agricultural census was taken as a part of each decennial census. Since 1920, a separate national agricultural census has been taken every 5 years. The 2007 census is the 14th census of agriculture of the U.S. Virgin Islands. The first, taken in 1920, was a special census authorized by the Secretary of Commerce. The next agriculture census was taken in 1930 in conjunction with the decennial census, a practice that continued every 10 years through 1960. The 1964 Census of Agriculture was the first quinquennial (5-year) census to be taken in the U.S. Virgin Islands. In 1976, Congress authorized the census of agriculture to be taken for 1978 and 1982 to adjust the data-reference year to coincide with the 1982 Economic Censuses covering manufacturing, mining, construction, retail trade, wholesale trade, service industries, and selected transportation activities. After 1982, the agriculture census reverted to a 5-year cycle. Data in this publication are for the calendar year 2007, and inventory data reflect what was on hand on December 31, 2007. This is the same reference period used in the 2002 census. Prior to the 2002 census, data was collected in the summer for the previous 12 months, with inventory items counted as what was on hand as of July 1 of the year the data collection was done.

    Objectives: The census of agriculture is the leading source of statistics about the U.S. Virgin Islands’s agricultural production and the only source of consistent, comparable data at the island level. Census statistics are used to measure agricultural production and to identify trends in an ever changing agricultural sector. Many local programs use census data as a benchmark for designing and evaluating surveys. Private industry uses census statistics to provide a more effective production and distribution system for the agricultural community.

    Geographic coverage

    National coverage

    Analysis unit

    Households

    Universe

    The statistical unit was a farm, defined as "any place from which USD 500 or more of agricultural products were produced and sold, or normally would had been sold, during the calendar year 2007". According to the census definition, a farm is essentially an operating unit, not an ownership tract. All land operated or managed by one person or partnership represents one farm. In the case of tenants, the land assigned to each tenant is considered a separate farm, even though the landlord may consider the entire landholding to be one unit rather than several separate units.

    Kind of data

    Census/enumeration data [cen]

    Sampling procedure

    (a) Method of Enumeration As in the previous censuses of the U.S. Virgin Islands, a direct enumeration procedure was used in the 2007 Census of Agriculture. Enumeration was based on a list of farm operators compiled by the U.S. Virgin Islands Department of Agriculture. This list was compiled with the help of the USDA Farm Services Agency located in St. Croix. The statistics in this report were collected from farm operators beginning in January of 2003. Each enumerator was assigned a list of individuals or farm operations from a master enumeration list. The enumerators contacted persons or operations on their list and completed a census report form for all farm operations. If the person on the list was not operating a farm, the enumerator recorded whether the land had been sold or rented to someone else and was still being used for agriculture. If land was sold or rented out, the enumerator got the name of the new operator and contacted that person to ensure that he or she was included in the census.

    (b) Frame The census frame consisted of a list of farm operators compiled by the U.S. Virgin Islands DA. This list was compiled with the help of the USDA Farm Services Agency, located in St. Croix.

    (c) Complete and/or sample enumeration methods The census was a complete enumeration of all farm operators registered in the list compiled by the United States of America in the CA 2007.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire (report form) for the CA 2007 was prepared by NASS, in cooperation with the DA of the U.S. Virgin Islands. Only one questionnaire was used for data collection covering topics on:

    • Land owned
    • Land use
    • Irrigation
    • Conservation programs and crop insurance
    • Field crops
    • Bananas, coffee, pineapples and plantain crops
    • Hay and forage crops
    • Nursery, Greenhouse, Floriculture, Sod and tree seedlings
    • Vegetables and melons
    • Hydroponic crops
    • Fruit
    • Root crops
    • Cattle and calves
    • Poultry
    • Hogs and pigs
    • Aquaculture
    • Other animals and livestock products
    • Value of sales
    • Organic agriculture
    • Federal and commonwealth agricultural program payments
    • Income from farm-related sources
    • Production expenses
    • Farm labour
    • Fertilizer and chemicals applied
    • Market value of land and buildings
    • Machinery, equipment and buildings
    • Practices
    • Type of organization
    • Operator characteristics

    The questionnaire of the 2007 CA covered 12 of the 16 core items' recommended for the WCA 2010 round.

    Cleaning operations

    DATA PROCESSING The processing of the 2007 Census of Agriculture for the U.S. Virgin Islands was done in St. Croix. Each report form was reviewed and coded prior to data keying. Report forms not meeting the census farm definition were voided. The remaining report forms were examined for clarity and completeness. Reporting errors in units of measures, illegible entries, and misplaced entries were corrected. After all the report forms had been reviewed and coded, the data were keyed and subjected to a thorough computer edit. The edit performed comprehensive checks for consistency and reasonableness, corrected erroneous or inconsistent data, supplied missing data based on similar farms, and assigned farm classification codes necessary for tabulating the data. All substantial changes to the data generated by the computer edits were reviewed and verified by analysts. Inconsistencies identified, but not corrected by the computer, were reviewed, corrected, and keyed to a correction file. The corrected data were then tabulated by the computer and reviewed by analysts. Prior to publication, tabulated totals were reviewed by analysts to identify inconsistencies and potential coverage problems. Comparisons were made with previous census data, as well as other available data. The computer system provided the capability to review up-to-date tallies of all selected data items for various sets of criteria which included, but were not limited to, geographic levels, farm types, and sales levels. Data were examined for each set of criteria and any inconsistencies or potential problems were then researched by examining individual data records contributing to the tabulated total. W hen necessary, data inconsistencies were resolved by making corrections to individual data records.

    Sampling error estimates

    The accuracy of these tabulated data is determined by the joint effects of the various nonsampling errors. No direct measures of these effects have been obtained; however, precautionary steps were taken in all phases of data collection, processing, and tabulation of the data in an effort to minimize the effects of nonsampling errors.

  20. Census of Agriculture, 2007 - Guam

    • microdata.fao.org
    Updated Jan 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Agricultural Statistics Service (2021). Census of Agriculture, 2007 - Guam [Dataset]. https://microdata.fao.org/index.php/catalog/study/GUM_2007_CA_v01_EN_M_v01_A_OCS
    Explore at:
    Dataset updated
    Jan 22, 2021
    Dataset authored and provided by
    National Agricultural Statistics Servicehttp://www.nass.usda.gov/
    Time period covered
    2008
    Area covered
    Guam
    Description

    Abstract

    For more than 150 years, the U.S. Department of Commerce, Bureau of the Census, conducted the census of agriculture. However, the 1997 Appropriations Act transferred the responsibility from the Bureau of the Census to the U.S. Department of Agriculture (USDA), National Agricultural Statistics Service (NASS). The 2007 Census of Agriculture for Guam is the second census to be conducted by the National Agricultural Statistics Service. The census of agriculture is taken to obtain agricultural statistics for each county, State (including territories and protectorates), and the Nation. The first U.S. agricultural census data were collected in 1840 as a part of the sixth decennial census. From 1840 to 1920, an agricultural census was taken as a part of each decennial census. Since 1920, a separate national agricultural census has been taken every 5 years.

    The 2007 census is the 14th census of agriculture of Guam. The first, taken in 1920, was a special census authorized by the Secretary of Commerce. The next agriculture census was taken in 1930 in conjunction with the decennial census, a practice that continued every 10 years through 1960. The 1964 Census of Agriculture was the first quinquennial (5-year) census to be taken in Guam. In 1976, Congress authorized the census of agriculture to be taken for 1978 and 1982 to adjust the data-reference year to coincide with other economic censuses. After 1982, the agriculture census reverted to a 5-year cycle for the years ending in 2 and 7.

    Geographic coverage

    National coverage

    Analysis unit

    Households

    Universe

    The statistical unit was the farm defined as any place that raised or produced any agricultural products for sale or home consumption.

    Kind of data

    Census/enumeration data [cen]

    Sampling procedure

    The census was a complete enumeration of all farm operators registered in the list compiled by the Guam Department of Agriculture. It was conducted by means of face to face interview filling paper questionnaires. The census frame was a list of farm operators compiled by the Guam Department of Agriculture.

    Mode of data collection

    Face-to-face paper [f2f]

    Research instrument

    One questionnaire was used which collected information on:

    • Land owned
    • Field crops
    • Fruit
    • Root crops
    • Cattle and calves
    • Poultry
    • Aquaculture
    • Expenditure
    • Production expenses
    • Machinery, equipment and buildings
    • Household characteristics

    Cleaning operations

    Processing: The processing of the 2007 Census of Agriculture for Guam was done by NASS. Each report form was reviewed and coded prior to data keying. Report forms not meeting the census farm definition were voided. The remaining report forms were examined for accuracy, consistency, and completeness. Reporting errors in computations, units of measures, data inconsistencies, and misplaced entries were corrected. Missing information was derived using reported data for similar type and size farms in nearby areas. After all the report forms had been reviewed and coded, the data were keyed and subjected to a thorough computer edit. The edit performed comprehensive checks for consistency and reasonableness, corrected erroneous or inconsistent data, supplied missing data based on similar farms, and assigned farm classification codes necessary for tabulating the data. All substantial changes to the data generated by the computer edits were reviewed and verified by analysts. Inconsistencies were reviewed, corrected, and keyed to a correction file. The corrected data were then tabulated by the computer and reviewed by analysts. Prior to publication, tabulated totals were reviewed by analysts to identify inconsistencies and potential coverage problems. Comparisons were made with previous census data, as well as other available data. The computer system provided the capability to review up-to-date tallies of all selected data items for various sets of criteria which included, but were not limited to, geographic levels, farm types, and sales levels. Data were examined for each set of criteria and a write-up (criticism) was produced for data that were inconsistent. Each criticism was then researched by examining individual data records contributing to the tabulated total. W hen necessary, data inconsistencies were resolved by carrying corrections to data records.

    Data appraisal

    No Post Enumeration Survey (PES) was performed. Quality checks included strict field supervision, clerical screening for farm activity, follow-up of non respondents, keying and transmittal of completed report forms, computerized editing of inconsistent and missing data, review and correction of individual records referred from the computer edit, review and correction of tabulated data, and electronic data processing.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford Center for Population Health Sciences (2020). Historic US Census - 1920 [Dataset]. http://doi.org/10.57761/v43s-pk48
Organization logo

Historic US Census - 1920

Explore at:
sas, csv, spss, stata, application/jsonl, arrow, avro, parquetAvailable download formats
Dataset updated
Jan 10, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 1920 - Dec 31, 1920
Area covered
United States
Description

Abstract

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The historic US 1920 census data was collected in January 1920. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Notes

  • We provide household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

  • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.

  • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.

  • Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, MORTGAGE, FARM, CLASSWKR, OCC1950, IND1950, MARST, RACE, SEX, RELATE, MTONGUE. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

  • Most inconsistent information was not edited for this release, thus there are observations outside of the universe for some variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next release.

%3C!-- --%3E

Section 2

This dataset was created on 2020-01-10 18:46:34.647 by merging multiple datasets together. The source datasets for this version were:

IPUMS 1920 households: This dataset includes all households from the 1920 US census.

IPUMS 1920 persons: This dataset includes all individuals from the 1920 US census.

IPUMS 1920 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1920 datasets.

Search
Clear search
Close search
Google apps
Main menu