6 datasets found

Synthetic population housing and person records for the United States
zenodo.org
openicpsr.org
+2more
zip
Updated Aug 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber; William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber (2023). Synthetic population housing and person records for the United States [Dataset]. http://doi.org/10.5281/zenodo.556121
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.556121
Dataset updated
Aug 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber; William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
The synthetic population was generated from the 2010-2014 ACS PUMS housing and person files.

United States Department of Commerce. Bureau of the Census. (2017-03-06).
American Community Survey 2010-2014 ACS 5-Year PUMS File [Data set].
Ann Arbor, MI: Inter-university Consortium of Political and Social
Research [distributor]. http://doi.org/10.3886/E100486V1

Outputs

There are 17 housing files
- repHus0.csv, repHus1.csv, ... repHus16.csv
and 32 person files
- rep_recode_ACSpus0.csv, rep_recode_ACSpus1.csv, ... rep_recode_ACSpus31.csv.

Files are split to be roughly equal in size. The files contain data for the entire country. Files are not split along any demographic characteristic. The person files and housing files must be concatenated to form a complete person file and a complete housing file, respectively.

If desired, person and housing records should be merged on 'id'. Variable description is below.

Data Dictionary
See [2010-2014 ACS PUMS data dictionary](http://doi.org/10.3886/E100486V1). All variables from the ACS PUMS housing files are present in the synthetic housing files and all variables from the ACS PUMS person files are present in the synthetic person files. Variables have not been modified in any way. Theoretically, variables like `person weight` no longer have any use in the synthetic population.

See README.md for more details.
n
NASA Earthdata
earthdata.nasa.gov
datasets.ai
+2more
Updated Dec 31, 1990
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ESDIS (1990). NASA Earthdata [Dataset]. http://doi.org/10.7927/H41V5BWK
Explore at:
Unique identifier
https://doi.org/10.7927/H41V5BWK
Dataset updated
Dec 31, 1990
Dataset authored and provided by
ESDIS
Description
The Public Use Microdata Samples (PUMS) are computer-accessible files containing records for a sample of housing Units, with information on the characteristics of each housing Unit and the people in it for 1940-1990. Within the limits of sample size and geographical detail, these files allow users to prepare virtually any tabulations they require. Each datafile is documented in a codebook containing a data dictionary and supporting appendix information. Electronic versions for the codebooks are only available for the 1980 and 1990 datafiles. Identifying information has been removed to protect the confidentiality of the respondents. PUMS is produced by the United States Census Bureau (USCB) and is distributed by USCB, Inter-university Consortium for Political and Social Research (ICPSR), and Columbia University Center for International Earth Science Information Network (CIESIN).
a
Decoding Home Values: The Power of Education vs. Race, Ethnicity, and Gender...
chi-phi-nmcdc.opendata.arcgis.com
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New Mexico Community Data Collaborative (2023). Decoding Home Values: The Power of Education vs. Race, Ethnicity, and Gender [Dataset]. https://chi-phi-nmcdc.opendata.arcgis.com/datasets/decoding-home-values-the-power-of-education-vs-race-ethnicity-and-gender
Explore at:
Dataset updated
Jul 25, 2023
Dataset authored and provided by
New Mexico Community Data Collaborative
Description
A detailed explanation of how this dataset was put together, including data sources and methodologies, follows below.Please see the "Terms of Use" section below for the Data DictionaryDATA ACQUISITION AND CLEANING PROCESSThis dataset was built from 5 separate datasets queried during the months of April and May 2023 from the Census Microdata System (link below):https://data.census.gov/mdat/#/All datasets include information on Property Value (VALP) by: Educational Attainment (SCHL), Gender (SEX), a specified race or ethnicity (RAC or HISP), and are grouped by Public Use Microdata Areas (PUMAS). PUMAS are geographic areas created by the Census bureau; they are weighted by land area and population to facilitate data analysis. Data also Included totals for the state of New Mexico, so 19 total geographies are represented. Datasets were downloaded separately by race and ethnicity because this was the only way to obtain the VALP, SCHL, and SEX variables intersectionally with race or ethnicity data. Datasets were downloaded separately by race and ethnicity because this was the only way to obtain the VALP, SCHL, and SEX variables intersectionally with race or ethnicity data. Cleaning each dataset started with recoding the SCHL and HISP variables - details on recoding can be found below.After recoding, each dataset was transposed so that PUMAS were rows and SCHL, VALP, SEX, and Race or Ethnicity variables were the columns.Median values were calculated in every case that recoding was necessary. As a result, all Property Values in this dataset reflect median values.At times the ACS data downloaded with zeros instead of the 'null' values in initial query results. The VALP variable also included a "-1" variable to reflect N/A values (details in variable notes). Both zeros and "-1" values were removed before calculating median values, both to keep the data true to the original query and to generate accurate median values.Recoding the SCHL variable resulted in 5 rows for each PUMA, reflecting the different levels of educational attainment in each region. Columns grouped variables by race or ethnicity and gender. Cell values were property values.All 5 datasets were joined after recoding and cleaning the data. Original datasets all include 95 rows with 5 separate Educational Attainment variables for each PUMA, including New Mexico State totals.Because 1 row was needed for each PUMA in order to map this data, the data was split by Educational Attainment (SCHL), resulting in 110 columns reflecting median property values for each race or ethnicity by gender and level of educational attainment.A short, unique 2 to 5 letter alias was created for each PUMA area in anticipation of needing a unique identifier to join the data with. GIS AND MAPPING PROCESSA PUMA shapefile was downloaded from the ACS site. The Shapefile can be downloaded here: https://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServerThe DBF from the PUMA shapefile was exported to Excel; this shapefile data included needed geographic information for mapping such as: GEOID, PUMACE. The UIDs created for each PUMA were added to the shapefile data; the PUMA shapfile data and ACS data were then joined on UID in JMP.The data table was joined to the shapefile in ARC GiIS, based on PUMA region (specifically GEOID text).The resulting shapefile was exported as a GDB (geodatabase) in order to keep 'Null' values in the data. GDBs are capable of including a rule allowing null values where shapefiles are not. This GDB was uploaded to NMCDCs Arc Gis platform. SYSTEMS USEDMS Excel was used for data cleaning, recoding, and deriving values. Recoding was done directly in the Microdata system when possible - but because the system is was in beta at the time of use some features were not functional at times.JMP was used to transpose, join, and split data. ARC GIS Desktop was used to create the shapefile uploaded to NMCDC's online platform. VARIABLE AND RECODING NOTESTIMEFRAME: Data was queried for the 5 year period of 2015 to 2019 because ACS changed its definiton for and methods of collecting data on race and ethinicity in 2020. The change resulted in greater aggregation and les granular data on variables from 2020 onward.Note: All Race Data reflects that respondants identified as the specified race alone or in combination with one or more other races.VARIABLE:ACS VARIABLE DEFINITIONACS VARIABLE NOTESDETAILS OR URL FOR RAW DATA DOWNLOADRACBLKBlack or African American ACS Query: RACBLK, SCHL, SEX, VALP 2019 5yrRACAIANAmerican Indian and Alaska Native ACS Query: RACAIAN, SCHL, SEX, VALP 2019 5yrRACASNAsian ACS Query: RACASN, SCHL, SEX, VALP 2019 5yrRACWHTWhite ACS Query: RACWHT, SCHL, SEX, VALP 2019 5yrHISPHispanic Origin ACS Query: HISP ORG, SCHL, SEX, VALP 2019 5yrHISP RECODE: 24 original separate variablesThe Hispanic Origin (HISP) variable originally included 24 subcategories reflecting Mexican, Central American, South American, and Caribbean Latino, and Spanish identities from each Latin American counry. 7 recoded VariablesThese 24 variables were recoded (grouped) into 7 simpler categories for data analysis: Not Spanish/Hispanic/Latino, Mexican, Caribbean Latino, Central American, South American, Spaniard, All other Spanish/Hispanic/Latino Female. Not Spanish/Hispanic/Latino was not really used in the final dataset as the race datasets provided that information.SCHLEducational Attainment25 original separate variablesThe Educational Attainment (SCHL) variable originally included 25 subcategories reflecting the education levels of adults (over 18) surveyed by the ACS. These include: Kindergarten, Grades 1 through 12 separately, 12th grade with no diploma, Highschool Diploma, GED or credential, less than 1 year of college, more than 1 year of college with no degree, Associate's Degree, Bachelor's Degree, Master's Degree, Professional Degree, and Doctorate Degree.SCHL RECODE: 5 recoded variablesThese 25 variables were recoded (grouped) into 5 simpler categories for data analysis: No High School Diploma, High School Diploma or GED, Some College, Bachelor's Degree, and Advanced or Professional DegreeSEXGender2 variables1 - Male, 2 - FemaleVALPProperty Value1 variableValues were rounded and top-coded by ACS for anonymity. The "-1" variable is defined as N/A (GQ/ Vacant lots except 'for sale only' and 'sold, not occupied' / not owned or being bought.) This variable reflects the median value of property owned by individuals of each race, ethnicity, gender, and educational attainment category.PUMAPublic Use Microdata Area18 PUMAsPUMAs in New Mexico can be viewed here:https://nmcdc.maps.arcgis.com/apps/mapviewer/index.html?webmap=d9fed35f558948ea9051efe9aa529eafData includes 19 total regions: 18 Pumas and NM State TotalsNOTES AND RESOURCESThe following resources and documentation were used to navigate the ACS PUMS system and to answer questions about variables:Census Microdata API User Guide:https://www.census.gov/data/developers/guidance/microdata-api-user-guide.Additional_Concepts.html#list-tab-1433961450Accessing PUMS Data:https://www.census.gov/programs-surveys/acs/microdata/access.htmlHow to use PUMS on data.census.govhttps://www.census.gov/programs-surveys/acs/microdata/mdat.html2019 PUMS Documentation:https://www.census.gov/programs-surveys/acs/microdata/documentation.2019.html#list-tab-13709392012014 to 2018 ACS PUMS Data Dictionary:https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMS_Data_Dictionary_2014-2018.pdf2019 PUMS Tiger/Line Shapefileshttps://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2019&layergroup=Public+Use+Microdata+Areas Note 1: NMCDC attemepted to contact analysts with the ACS system to clarify questions about variables, but did not receive a timely response. Documentation was then consulted.Note 2: All relevant documentation was reviewed and seems to imply that all survey questions were answered by adults, age 18 or over. Youth who have inherited property could potentially be reflected in this data.Dataset and feature service created in May 2023 by Renee Haley, Data Specialist, NMCDC.
g
Public Use Microdata Samples (PUMS) | gimi9.com
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public Use Microdata Samples (PUMS) | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_public-use-microdata-samples-pums/
Explore at:
Description
The Public Use Microdata Samples (PUMS) are computer-accessible files containing records for a sample of housing Units, with information on the characteristics of each housing Unit and the people in it for 1940-1990. Within the limits of sample size and geographical detail, these files allow users to prepare virtually any tabulations they require. Each datafile is documented in a codebook containing a data dictionary and supporting appendix information. Electronic versions for the codebooks are only available for the 1980 and 1990 datafiles. Identifying information has been removed to protect the confidentiality of the respondents. PUMS is produced by the United States Census Bureau (USCB) and is distributed by USCB, Inter-university Consortium for Political and Social Research (ICPSR), and Columbia University Center for International Earth Science Information Network (CIESIN).
H
Mode choice to work in 100 largest US and Mexican urban areas
dataverse.harvard.edu
Updated Mar 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erick Guerra (2021). Mode choice to work in 100 largest US and Mexican urban areas [Dataset]. http://doi.org/10.7910/DVN/UK8PWW
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UK8PWW
Dataset updated
Mar 2, 2021
Dataset provided by
Harvard Dataverse
Authors
Erick Guerra
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Mexico, United States
Description
This dataset contains data on commutes to work for individuals living in the 100 largest urban areas in the US and Mexico. The dataset relies primarily on data from the 2015 Mexican Intercensus and the US ACS PUMS data merged. These data have been processed and harmonized for consistency and joined to spatial data on the 200 urban areas. The data dictionary provides information on variable names and sources. Please refer to the published paper for additional details on data transformations and exclusions.
A
Census of Population, 1950 [United States]: Public Use Microdata Sample,...
abacus.library.ubc.ca
bin, pdf
Updated Nov 19, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (2009). Census of Population, 1950 [United States]: Public Use Microdata Sample, 1950 [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=bf3ac79915810bb6ad9cfd6d2978?persistentId=hdl%3A11272.1%2FAB2%2F6SWYBU&version=&q=&fileTypeGroupFacet=&fileTag=%22Data%22&fileSortField=size&fileSortOrder=
Explore at:
bin(18754640), pdf(6136674)Available download formats
Dataset updated
Nov 19, 2009
Dataset provided by
Abacus Data Network
Area covered
United States, United States
Description
This data collection and its 1940 counterpart were assembled through a collaborative effort between the United States Bureau of the Census and the Center for Demography and Ecology of the University of Wisconsin. The 1940 and 1950 Census Public Use Sample Project was supported by The National Science Foundation under Grant SES-7704135. The collections contain a stratified 1-percent sample of households, with separate records for each household, for each \'sample line\' respondent, and for each person in the household. These records were encoded from microfilm copies of original handwritten enumeration schedules from the 1940 and 1950 Censuses of Population. The universe for the sample included all persons and households within the United States. Geographic identification of the location of the sampled households includes Census regions and divisions, States (except Alaska and Hawaii), Standard Metropolitan Areas (SMA\'s), and State Economic Areas (SEA\'s). The SMA\'s and SEA\'s are comparable for both the 1940 and 1950 Public Use Microdata Samples (PUMS). The data collections were constructed from and consist of 20 independently-drawn subsamples stored in 20 discrete physical files. Each of the 20 subsamples contains three record types (household, \'sample line\', and person). Both collections had both a complete-count and a sample component. Individuals selected for the sample component were asked a set of additional questions. Only households with a \'sample line\' person were included in the public use microdata sample. The collections also contain records of group quarters members who were also on the Census \'sample line\'. For the 1940 and 1950 collections, each household record contains variables describing the location and composition of the household. The \'sample line\' records for 1950 contain variables describing demographic characteristics such as nativity, marital status, number of children, veteran status, education, income, and occupation. The person records for 1950 contain such demographic variables as nativity, marital status, family membership, and occupation. Accompanying the data collections are code books which include an abstract, descriptions of sample design, processing procedures and file structure, a data dictionary (record layout), category code lists, and a glossary. The data collections are arranged by subsample with each subsample stored as a separate physical file of information. The 20 subsamples were selected randomly. Within each of the 20 subsamples, records are sequenced by State. Extracting all of the records for one State entails reading through all of the 20 physical files and selecting that State\'s records from each of the 20 subsamples. Record types are ordered within household (household characteristics first, \'sample line\' next, and person records last). The 1950 collection consists of a total of 2,844,458 data records: 461,130 household records, 461,130 \'sample line\' records, and 1,922,198 person records. Each record type has a logical record length of 133.;
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber; William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber (2023). Synthetic population housing and person records for the United States [Dataset]. http://doi.org/10.5281/zenodo.556121

Synthetic population housing and person records for the United States

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.556121

Dataset updated

Aug 22, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber; William Sexton; John M. Abowd; Ian M. Schmutte; Lars Vilhuber

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United States

Description

The synthetic population was generated from the 2010-2014 ACS PUMS housing and person files.

United States Department of Commerce. Bureau of the Census. (2017-03-06).
American Community Survey 2010-2014 ACS 5-Year PUMS File [Data set].
Ann Arbor, MI: Inter-university Consortium of Political and Social
Research [distributor]. http://doi.org/10.3886/E100486V1

Outputs

There are 17 housing files
- repHus0.csv, repHus1.csv, ... repHus16.csv
and 32 person files
- rep_recode_ACSpus0.csv, rep_recode_ACSpus1.csv, ... rep_recode_ACSpus31.csv.

Files are split to be roughly equal in size. The files contain data for the entire country. Files are not split along any demographic characteristic. The person files and housing files must be concatenated to form a complete person file and a complete housing file, respectively.

If desired, person and housing records should be merged on 'id'. Variable description is below.

Data Dictionary
See [2010-2014 ACS PUMS data dictionary](http://doi.org/10.3886/E100486V1). All variables from the ACS PUMS housing files are present in the synthetic housing files and all variables from the ACS PUMS person files are present in the synthetic person files. Variables have not been modified in any way. Theoretically, variables like `person weight` no longer have any use in the synthetic population.

See README.md for more details.

Clear search

Close search

Google apps

Main menu

Synthetic population housing and person records for the United States

NASA Earthdata

Decoding Home Values: The Power of Education vs. Race, Ethnicity, and Gender...

Public Use Microdata Samples (PUMS) | gimi9.com

Mode choice to work in 100 largest US and Mexican urban areas

Census of Population, 1950 [United States]: Public Use Microdata Sample,...

Synthetic population housing and person records for the United States