The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
These data comprise Census records relating to the Alaskan people's population demographics for the State of Alaskan Salmon and People (SASAP) Project. Decennial census data were originally extracted from IPUMS National Historic Geographic Information Systems website: https://data2.nhgis.org/main (Citation: Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota. 2017. http://doi.org/10.18128/D050.V12.0). A number of relevant tables of basic demographics on age and race, household income and poverty levels, and labor force participation were extracted. These particular variables were selected as part of an effort to understand and potentially quantify various dimensions of well-being in Alaskan communities. The file "censusdata_master.csv" is a consolidation of all 21 other data files in the package. For detailed information on how the datasets vary over different years, view the file "readme.docx" available in this data package. The included .Rmd file is a script which combines the 21 files by year into a single file (censusdata_master.csv). It also cleans up place names (including typographical errors) and uses the USGS place names dataset and the SASAP regions dataset to assign latitude and longitude values and region values to each place in the dataset. Note that some places were not assigned a region or location because they do not fit well into the regional framework. Considerable heterogeneity exists between census surveys each year. While we have attempted to combine these datasets in a way that makes sense, there may be some discrepancies or unexpected values. The RMarkdown document SASAPWebsiteGraphicsCensus.Rmd is used to generate a variety of figures using these data, including the additional file Chignik_population.png. An additional set of 25 figures showing regional trends in population and income metrics are also included.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/WFFS4Whttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/WFFS4W
The Vernacular Archive of Normal Volunteers (VANV), 1940-2018 (inclusive) is a collection of oral histories, associated archival documents, and project records created and collected by Laura Jeanine Morris Stark (born 1975) to explore the lives of the first “normal control” research subjects at the Clinical Center of the United States National Institutes of Health (NIH) in Bethesda, Maryland who were recruited through NIH’s Normal Volunteer Patient Program. Dataset consists of materials from two sources. First, it includes audio recordings and transcripts of oral histories Laura Stark conducted from 2010-2017 with individuals who were involved with the NIH Normal Volunteer Patient Program between 1954 and 2002, along with related personal documents given to Stark by interviewees such as photographs, letters, diaries, news clippings and other memorabilia. Most of the interviewees were former “normal controls” and others were NIH staff members or scientists who did research on the “normal volunteers.” For the interviewees who provided Stark with historical contextual documents, these materials were digitized and the files were combined into one "records" PDF file for each individual interviewee by the Center for the History of Medicine as part of the dataverse deposit process. Original contextual documents remain in the possession of the interviewees. Second, the Dataset includes digital duplicates of materials related to the Normal Volunteer Patient Program compiled by Stark from the special collections of organizations that were the sources of “normal volunteers” for the NIH Clinical Center. Physical copies of the materials remain in the historical collections of organizations, such as universities, churches, civic groups, and labor unions, that signed contracts with NIH to provide healthy people for scientists to research through the Normal Volunteer Patient Program. Records for individual collections are grouped alphabetically by the last name of the interviewee or the name of the organization. Data files include audio files of the oral history interviews, interview transcripts, individual consent and release forms, and related contextual documents supplied by interviewees or organizations. Associated records such as interview questions, the protocol for interview transcription, and template consent, release, and donation forms may be found in the dataset "VANV project records, 2010-2018." Note that the date span (1940-2018) of this dataset reflects the creation dates of original materials that may exist here only as more recently created digital reproductions, for example, items from 1940 are digital scans of letters, photographs, and other documents created in 1940.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Home Owners’ Loan Corporation (HOLC) was a U.S. federal agency that graded mortgage investment risk of neighborhoods across the U.S. between 1935 and 1940. HOLC residential security maps standardized neighborhood risk appraisal methods that included race and ethnicity, pioneering the institutional logic of residential “redlining.” The Mapping Inequality Project digitized the HOLC mortgage security risk maps from the 1930s. We overlaid the HOLC maps with 2010 and 2020 census tracts for 142 cities across the U.S. using ArcGIS and determined the proportion of HOLC residential security grades contained within the boundaries. We assigned a numerical value to each HOLC risk category as follows: 1 for “A” grade, 2 for “B” grade, 3 for “C” grade, and 4 for “D” grade. We calculated a historic redlining score from the summed proportion of HOLC residential security grades multiplied by a weighting factor based on area within each census tract. A higher score means greater redlining of the census tract. Continuous historic redlining score, assessing the degree of “redlining,” as well as 4 equal interval divisions of redlining, can be linked to existing data sources by census tract identifier allowing for one form of structural racism in the housing market to be assessed with a variety of outcomes. The 2010 files are set to census 2010 tract boundaries. The 2020 files use the new census 2020 tract boundaries, reflecting the increase in the number of tracts from 12,888 in 2010, to 13,488 in 2020. Use the 2010 HRS with decennial census 2010 or ACS 2010-2019 data. As of publication (10/15/2020) decennial census 2020 data for the P1 (population) and H1 (housing) files are available from census.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes