100+ datasets found

Vintage 2018 Population Estimates: Demographic Characteristics Estimates by...
catalog.data.gov
Updated Jul 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Vintage 2018 Population Estimates: Demographic Characteristics Estimates by Age Groups [Dataset]. https://catalog.data.gov/dataset/vintage-2018-population-estimates-demographic-characteristics-estimates-by-age-groups
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2018 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/modified-race-summary-file-method/mrsf2010.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2017) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/programs-surveys/popest.html.

Sample data for analysis of demographic potential of the 15-minute city in...

zenodo.org

bin, txt

Updated Aug 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Joan Perez; Joan Perez; Giovanni Fusco; Giovanni Fusco (2024). Sample data for analysis of demographic potential of the 15-minute city in northern and southern France [Dataset]. http://doi.org/10.5281/zenodo.13456826

Explore at:

bin, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13456826

Dataset updated

Aug 29, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Joan Perez; Joan Perez; Giovanni Fusco; Giovanni Fusco

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Area covered

France, Southern France

Description

This upload contains two Geopackage files of raw data used for urban analysis in the outskirts of Lille and Nice, France. 
The data include building footprints (layer "building"), roads (layer "road"), and administrative boundaries (layer "adm_boundaries")
extracted from version 3.3 of the French dataset BD TOPO®3 (IGN, 2023) for the municipalities of Santes, Hallennes-lez-Haubourdin,
Haubourdin, and Emmerin in northern France (Geopackage "DPC_59.gpkg") and Drap, Cantaron and La Trinité in southern France 
(Geopackage "DPC_06.gpkg").

Metadata for these layers is available here: https://geoservices.ign.fr/sites/default/files/2023-01/DC_BDTOPO_3-3.pdf

Additionally, this upload contains the results of the following algorithms available in GitHub (https://github.com/perezjoan/emc2-WP2?tab=readme-ov-file)

1. The identification of main streets using the QGIS plugin Morpheo (layers "road_morpheo" and "buffer_morpheo") 
https://plugins.qgis.org/plugins/morpheo/

2. The identification of main streets in local contexts – connectivity locally weighted (layer "road_LocRelCon")

3. Basic morphometry of buildings (layer "building_morpho")

4. Evaluation of the number of dwellings within inhabited buildings (layer "building_dwellings")

5. Projecting population potential accessible from main streets (layer "road_pop_results")

Project website: http://emc2-dut.org/

Publications using this sample data: 
Perez, J. and Fusco, G., 2024. Potential of the 15-Minute Peripheral City: Identifying Main Streets and Population Within Walking Distance. In: O. Gervasi, B. Murgante, C. Garau, D. Taniar, A.M.A.C. Rocha and M.N. Faginas Lago, eds. Computational Science and Its Applications – ICCSA 2024 Workshops. ICCSA 2024. Lecture Notes in Computer Science, vol 14817. Cham: Springer, pp.50-60. https://doi.org/10.1007/978-3-031-65238-7_4.

Acknowledgement. This work is part of the emc2 project, which received the grant ANR-23-DUTP-0003-01 from the French National Research Agency (ANR) within the DUT Partnership.

o
Demographic Analysis Workflow using Census API in Jupyter Notebook:...
openicpsr.org
delimited
Updated Jul 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donghwan Gu; Nathanael Rosenheim (2020). Demographic Analysis Workflow using Census API in Jupyter Notebook: 1990-2000 Population Size and Change [Dataset]. http://doi.org/10.3886/E120381V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E120381V1
Dataset updated
Jul 23, 2020
Dataset provided by
Texas A&M University
Authors
Donghwan Gu; Nathanael Rosenheim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Boone County, Kentucky, US Counties
Description
This archive reproduces a table titled "Table 3.1 Boone county population size, 1990 and 2000" from Wang and vom Hofe (2007, p.58). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses Census API to retrieve data, reproduce the table, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration and management. The Census API is used to obtain population counts from the 1990 and 2000 Decennial Census (Summary File 1, 100% data). All downloaded data are maintained in the notebook's temporary working directory while in use. The data are also stored separately with this archive.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code to perform the following functions:install/import necessary Python packagesintroduce a Census API Querydownload Census data via CensusAPI manipulate Census tabular data calculate absolute change and percent changeformatting numbersexport the table to csvThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the Census API downloads. The notebook could be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
Z
Data from: Using social media and personality traits to assess software...
data.niaid.nih.gov
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miriam Bernardino Silva (2023). Using social media and personality traits to assess software developers' emotional polarity [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7846995
Explore at:
Dataset updated
Apr 20, 2023
Dataset provided by
Margarida Lima
Uirá Kulesza
Leo Silva
Henrique Madeira
Marília Gurgel de Castro
Milena Santos
Miriam Bernardino Silva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Companion DATA

Title: Using social media and personality traits to assess software developers' emotional polarity

Authors: Leo Moreira Silva Marília Gurgel Castro Miriam Bernardino Silva Milena Santos Uirá Kulesza Margarida Lima Henrique Madeira

Journal: PeerJ Computer Science

Github: https://github.com/leosilva/peerj_computer_science_2022

The folders contain:

Experiment_Protocol.pdf: document that present the protocol regarding recruitment protocol, data collection of public posts from Twitter, criteria for manual analysis, and the assessment of Big Five factors from participants and psychologists. English version.

/analysis analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications

/dataset alldata.json: contains the dataset used in the paper

/ethics_committee committee_response_english_version.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. English version. committee_response_original_portuguese_version: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. Portuguese version. committee_submission_form_english_version.pdf: the project submitted to the committee. English version. committee_submission_form_original_portuguese_version.pdf: the project submitted to the committee. Portuguese version. consent_form_english_version.pdf: declaration of free and informed consent fulfilled by participants. English version. consent_form_original_portuguese_version.pdf: declaration of free and informed consent fulfilled by participants. Portuguese version. data_protection_declaration_english_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. English version. data_protection_declaration_original_portuguese_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. Portuguese version.

/notebooks General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period

/surveys Demographic_Survey_english_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. English version. Demographic_Survey_portuguese_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. Portuguese version. Demographic_Survey_answers.xlsx: participants' demographic survey answers ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_en.doc: translation in English of the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_answers.xlsx: participantes' and psychologists' answers for BFI

We have removed from dataset any sensible data to protect participants' privacy and anonymity. We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.
Vintage 2014 Population Estimates: State Population Estimates by Single Year...
catalog.data.gov
Updated Jul 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Vintage 2014 Population Estimates: State Population Estimates by Single Year of Age, Sex, 5 Races, and Hispanic Origin [Dataset]. https://catalog.data.gov/dataset/vintage-2014-population-estimates-state-population-estimates-by-single-year-of-age-sex-5-r
Explore at:
Dataset updated
Jul 27, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Annual State Resident Population Estimates for 5 Race Groups (5 Race Alone or in Combination Groups) by Age, Sex, and Hispanic Origin // Source: U.S. Census Bureau, Population Division // Note: 'In combination' means in combination with one or more other races. The sum of the five race groups adds to more than the total population because individuals may report more than one race. The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see http://www.census.gov/popest/data/historical/files/MRSF-01-US1.pdf. // For detailed information about the methods used to create the population estimates, see http://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2013) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: http://www.census.gov/popest/index.html.
2010 Census Production Settings Demographic and Housing Characteristics...
registry.opendata.aws
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau, 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File [Dataset]. https://registry.opendata.aws/census-2010-dhc-nmf/
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File (2023-06-30) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9 , and implemented in https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code). The NMF was produced using the official “production settings,” the final set of algorithmic parameters and privacy-loss budget allocations, that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Characteristics File. The NMF consists of the full set of privacy-protected statistical queries (counts of individuals or housing units with particular combinations of characteristics) of confidential 2010 Census data relating to the 2010 Demonstration Data Products Suite – Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File – Production Settings (2023-04-03). These statistical queries, called “noisy measurements” were produced under the zero-Concentrated Differential Privacy framework (Bun, M. and Steinke, T [2016] https://arxiv.org/abs/1605.02065; see also Dwork C. and Roth, A. [2014] https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf) implemented via the discrete Gaussian mechanism (Cannone C., et al., [2023] https://arxiv.org/abs/2004.00010), which added positive or negative integer-valued noise to each of the resulting counts. The noisy measurements are an intermediate stage of the TDA prior to the post-processing the TDA then performs to ensure internal and hierarchical consistency within the resulting tables. The Census Bureau has released these 2010 Census demonstration data to enable data users to evaluate the expected impact of disclosure avoidance variability on 2020 Census data. The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File (2023-04-03) has been cleared for public dissemination by the Census Bureau Disclosure Review Board (CBDRB-FY22-DSEP-004).

The 2010 Census Production Settings Demographic and Housing Characteristics Demonstration Noisy Measurement File includes zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism. These are estimated counts of individuals and housing units included in the 2010 Census Edited File (CEF), which includes confidential data initially collected in the 2010 Census of Population and Housing. The noisy measurements included in this file were subsequently post-processed by the TopDown Algorithm (TDA) to produce the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) (https://www2.census.gov/programs-surveys/decennial/2020/program-management/data-product-planning/2010-demonstration-data-products/04-Demonstration_Data_Products_Suite/2023-04-03/). As these 2010 Census demonstration data are intended to support study of the design and expected impacts of the 2020 Disclosure Avoidance System, the 2010 CEF records were pre-processed before application of the zCDP framework. This pre-processing converted the 2010 CEF records into the input-file format, response codes, and tabulation categories used for the 2020 Census, which differ in substantive ways from the format, response codes, and tabulation categories originally used for the 2010 Census.

The NMF provides estimates of counts of persons in the CEF by various characteristics and combinations of characteristics including their reported race and ethnicity, whether they were of voting age, whether they resided in a housing unit or one of 7 group quarters types, and their census block of residence after the addition of discrete Gaussian noise (with the scale parameter determined by the privacy-loss budget allocation for that particular query under zCDP). Noisy measurements of the counts of occupied and vacant housing units by census block are also included. Lastly, data on constraints—information into which no noise was infused by the Disclosure Avoidance System (DAS) and used by the TDA to post-process the noisy measurements into the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) —are provided.
Vintage 2016 Population Estimates: National Monthly Population Estimates
catalog.data.gov
Updated Jul 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Vintage 2016 Population Estimates: National Monthly Population Estimates [Dataset]. https://catalog.data.gov/dataset/vintage-2016-population-estimates-national-monthly-population-estimates
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Monthly Population Estimates by Universe, Age, Sex, Race, and Hispanic Origin for the United States: April 1, 2010 to December 1, 2016 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/modified-race-summary-file-method/mrsf2010.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // Persons on active duty in the Armed Forces were not enumerated in the 2010 Census. Therefore, variables for the 2010 Census civilian, civilian noninstitutionalized, and resident population plus Armed Forces overseas populations cannot be derived and are not available on these files. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2015) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/programs-surveys/popest.html.
US County & Zipcode Historical Demographics
kaggle.com
Updated Jun 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BitRook (2021). US County & Zipcode Historical Demographics [Dataset]. https://www.kaggle.com/datasets/bitrook/us-county-historical-demographics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2021
Dataset provided by
Kaggle
Authors
BitRook
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
US County & Zipcode Historical Demographics

Easily lookup US historical demographics by county FIPS or zipcode in seconds with this file containing over 5,901 different columns including:

*Lat/Long *Boundaries *State FIPS *Population from 2010-2019 *Death Rate from 2010-2019 *Unemployment from 2001-2020 *Education from 1970-2019 *Gender and Age Population

Provided by bitrook.com to help Data Scientists clean data faster.

Data Sources

All Data Combined Source:

https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/

Population Source:

https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/

Unemployment Source:

https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/

Zip FIPS Crosswalk Source:

https://data.world/niccolley/us-zipcode-to-county-state

County Boundaries Source:

https://public.opendatasoft.com/explore/dataset/us-county-boundaries/table/?disjunctive.statefp&disjunctive.countyfp&disjunctive.name&disjunctive.namelsad&disjunctive.stusab&disjunctive.state_name

Age Sex Source:

https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/asrh/cc-est2019-agesex-**.csv https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/cc-est2019-agesex.pdf

Races Source:

https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/asrh/cc-est2019-alldata.csv https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/cc-est2019-alldata.pdf
[Dataset] Data for the course "Population Genomics" at Aarhus University
zenodo.org
application/gzip, bin
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7670839
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside

Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by

creating the folder Course_Env: mkdir Course_Env

untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env

Activate the environment: conda activate ./Course_Env

Run the unpacking script (it can take quite some time to get it done): conda-unpack

Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.

environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:

conda env create -f environment_with_args.yml -p ./Course_Env

conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.

Apply commonly used population genomic methods.

Explain the theory behind common population genomic methods.

Reflect on strengths and limitations of population genomic methods.

Interpret and analyze results of population genomic inference.

Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:

Coop chapters 1, 2, 3, Paper: Genome Diversity Project

Drift and the coalescent:

Coop chapter 4; Paper: Platypus

Exercise: Read mapping and base calling

Recombination:

Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation

Exercise: Phasing and recombination rate

Population strucure and incomplete lineage sorting:

Lecture: Coop chapter 6, Review: Incomplete lineage sorting

Exercise: Working with VCF files

Hidden Markov models:

Lecture: Durbin chapter 3, Paper: population structure

Exercise: Inference of population structure and admixture

Ancestral recombination graphs:

Lecture: Paper: Approximating the ARG, Paper: Tree inference

Exercise: ARG dashboard exercises + Inference of trees along sequence

Past population demography:

Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference

Exercise: Inferring historical populations

Direct and linked selection:

Lecture: Coop chapters 12, 13, revisit Paper: Tree inference

Admixture:

Lecture: Review: Admixture, Paper: Admixture inference

Exercise: Detecting archaic ancestry in modern humans

Genome-wide association study (GWAS):

Lecture: Coop lecture notes 99-120

Exercise: GWAS quality control

Heritability:

Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)

Exercise: Association testing

Evolution and disease:

Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)

Exercise: Estimating heritability
Vintage 2013 Population Estimates: National Monthly Population Estimates by...
catalog.data.gov
Updated Sep 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Vintage 2013 Population Estimates: National Monthly Population Estimates by Single Year of Age, Sex, 6 Races, Hispanic Origin, and Universe [Dataset]. https://catalog.data.gov/dataset/vintage-2013-population-estimates-national-monthly-population-estimates-by-single-year-of--34cfa
Explore at:
Dataset updated
Sep 18, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Monthly Population Estimates by Universe, Age, Sex, 6 Races, and Hispanic Origin for the United States: April 1, 2010 to July 1, 2013 // File: 7/1/2013 National Population Estimates // Source: U.S. Census Bureau, Population Division // Release Date: June 2014 // Note: The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see http://www.census.gov/popest/data/historical/files/MRSF-01-US1.pdf. // Persons on active duty in the Armed Forces were not enumerated in the 2010 Census. Therefore, variables for the 2010 Census civilian, civilian noninstitutionalized, and resident population plus Armed Forces overseas populations cannot be derived and are not available on these files. // For detailed information about the methods used to create the population estimates, see http://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2013) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: http://www.census.gov/popest/index.html.
Vintage 2015 Population Estimates: Demographic Characteristics Estimates by...
catalog.data.gov
Updated Jul 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Vintage 2015 Population Estimates: Demographic Characteristics Estimates by Age Groups [Dataset]. https://catalog.data.gov/dataset/vintage-2015-population-estimates-demographic-characteristics-estimates-by-age-groups
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2015 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www.census.gov/popest/data/historical/files/MRSF-01-US1.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2015) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/popest/index.html.
Provisional COVID-19 death counts, rates, and percent of total deaths, by...
catalog.data.gov
healthdata.gov
+2more
Updated Jun 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). Provisional COVID-19 death counts, rates, and percent of total deaths, by jurisdiction of residence [Dataset]. https://catalog.data.gov/dataset/provisional-covid-19-death-counts-rates-and-percent-of-total-deaths-by-jurisdiction-of-res
Explore at:
Dataset updated
Jun 27, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
This file contains COVID-19 death counts, death rates, and percent of total deaths by jurisdiction of residence. The data is grouped by different time periods including 3-month period, weekly, and total (cumulative since January 1, 2020). United States death counts and rates include the 50 states, plus the District of Columbia and New York City. New York state estimates exclude New York City. Puerto Rico is included in HHS Region 2 estimates. Deaths with confirmed or presumed COVID-19, coded to ICD–10 code U07.1. Number of deaths reported in this file are the total number of COVID-19 deaths received and coded as of the date of analysis and may not represent all deaths that occurred in that period. Counts of deaths occurring before or after the reporting period are not included in the file. Data during recent periods are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction and cause of death. Death counts should not be compared across states. Data timeliness varies by state. Some states report deaths on a daily basis, while other states report deaths weekly or monthly. The ten (10) United States Department of Health and Human Services (HHS) regions include the following jurisdictions. Region 1: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont; Region 2: New Jersey, New York, New York City, Puerto Rico; Region 3: Delaware, District of Columbia, Maryland, Pennsylvania, Virginia, West Virginia; Region 4: Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, Tennessee; Region 5: Illinois, Indiana, Michigan, Minnesota, Ohio, Wisconsin; Region 6: Arkansas, Louisiana, New Mexico, Oklahoma, Texas; Region 7: Iowa, Kansas, Missouri, Nebraska; Region 8: Colorado, Montana, North Dakota, South Dakota, Utah, Wyoming; Region 9: Arizona, California, Hawaii, Nevada; Region 10: Alaska, Idaho, Oregon, Washington. Rates were calculated using the population estimates for 2021, which are estimated as of July 1, 2021 based on the Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count. The Blended Base consists of the blend of Vintage 2020 postcensal population estimates, 2020 Demographic Analysis Estimates, and 2020 Census PL 94-171 Redistricting File (see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2020-2021/methods-statement-v2021.pdf). Rates are based on deaths occurring in the specified week/month and are age-adjusted to the 2000 standard population using the direct method (see https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-08-508.pdf). These rates differ from annual age-adjusted rates, typically presented in NCHS publications based on a full year of data and annualized weekly/monthly age-adjusted rates which have been adjusted to allow comparison with annual rates. Annualization rates presents deaths per year per 100,000 population that would be expected in a year if the observed period specific (weekly/monthly) rate prevailed for a full year. Sub-national death counts between 1-9 are suppressed in accordance with NCHS data confidentiality standards. Rates based on death counts less than 20 are suppressed in accordance with NCHS standards of reliability as specified in NCHS Data Presentation Standards for Proportions (available from: https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf.).
Vintage 2013 Population Estimates: State Population Estimates by Single Year...
s.cnmilf.com
catalog.data.gov
Updated Jul 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Vintage 2013 Population Estimates: State Population Estimates by Single Year of Age, Sex, 6 Races, and Hispanic Origin [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/vintage-2013-population-estimates-state-population-estimates-by-single-year-of-age-sex-6-r
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Annual State Resident Population Estimates for 6 Race Groups (5 Race Alone Groups and Two or More Races) by Age, Sex, and Hispanic Origin: April 1, 2010 to July 1, 2013 // File: 7/1/2013 State Characteristics Population Estimates // Source: U.S. Census Bureau, Population Division // Release Date: June 2014 // Note: The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see http://www.census.gov/popest/data/historical/files/MRSF-01-US1.pdf. // For detailed information about the methods used to create the population estimates, see http://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2013) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: http://www.census.gov/popest/index.html.
Data from: Demographic Reports
catalog.data.gov
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Retirement Thrift Investment Board (2025). Demographic Reports [Dataset]. https://catalog.data.gov/dataset/demographic-reports
Explore at:
Dataset updated
Feb 14, 2025
Dataset provided by
Federal Retirement Thrift Investment Boardhttps://www.frtib.gov/
Description
Demographic reports on TSP participant behavior and investment manager diversity are reported annually to Congress and available to the public via FRTIB’s Open Data Plan. Reports are in PDF format with included data tables.
Z
Data from: Demography, education, and research trends in the...
data.niaid.nih.gov
zenodo.org
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Becker, Daniel J (2024). Demography, education, and research trends in the interdisciplinary field of disease ecology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5812145
Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Becker, Daniel J
Brandell, Ellen E
Sampson, Laura
Forbes, Kristian M
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of Supporting Files

Demography, education, and research trends in the interdisciplinary field of disease ecology

Ellen E. Brandell, Daniel J. Becker, Laura Sampson, Kristian M. Forbes

TopArticles_Inclusion.xlsx

This Excel provides a list of influential articles written in by survey participants at least two times.

Sheet “table”: just tabular information

Sheet “withNotes”: includes notes about data, number of citations from survey participants, and percent inclusion calculations.

Columns are:

‘INCLUDED’: if the article appeared in the corpus (1) or not (0)

‘COUNT’: the number of times survey participants wrote in the article

‘ARTICLE’: article citation Percent of articles included in the corpus are calculated for 4 or more write-ins, 3-write-ins, 2 write-ins, and across all articles written in twice.

IRB_Correspondence_STUDY00010582.pdf

Institutional Review Board correspondence and approval from Pennsylvania State University. Survey response data may be available upon request from the corresponding author. To protect participants, any potentially identifying information will be removed prior to filling a request. See the online Supporting Information for this article for extensive reporting of survey results prior to a request.

FullSurvey.pdf

A PDF of the full survey form.

CorpusFrequencyAnalysis.ipynb

This is the Python script used for corpus organization and the topic detection analysis. It includes some plot generation.
Vintage 2013 Population Estimates: County Population Estimates by 5 Year Age...
catalog.data.gov
Updated Sep 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Vintage 2013 Population Estimates: County Population Estimates by 5 Year Age Groups, Sex, 6 Races, and Hispanic Origin [Dataset]. https://catalog.data.gov/dataset/vintage-2013-population-estimates-county-population-estimates-by-5-year-age-groups-sex-6-r
Explore at:
Dataset updated
Sep 5, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Annual County Resident Population Estimates for 6 Race Groups (5 Race Alone Groups and Two or More Races) by Five-Year Age Groups, Sex, and Hispanic Origin: April 1, 2010 to July 1, 2013 // File: 7/1/2013 County Characteristics Resident Population Estimates // Source: U.S. Census Bureau, Population Division // Release Date: June 2014 // Note: The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see http://www.census.gov/popest/data/historical/files/MRSF-01-US1.pdf. // For detailed information about the methods used to create the population estimates, see http://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2013) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: http://www.census.gov/popest/index.html.
A
‘COVID-19 Cases by Population Characteristics Over Time’ analyzed by...
analyst-2.ai
Updated Feb 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 Cases by Population Characteristics Over Time’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-covid-19-cases-by-population-characteristics-over-time-097d/6c8f14dd/?iid=004-510&v=presentation
Explore at:
Dataset updated
Feb 15, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘COVID-19 Cases by Population Characteristics Over Time’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/a3291d85-0076-43c5-a59c-df49480cdc6d on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Note: On January 22, 2022, system updates to improve the timeliness and accuracy of San Francisco COVID-19 cases and deaths data were implemented. You might see some fluctuations in historic data as a result of this change. Due to the changes, starting on January 22, 2022, the number of new cases reported daily will be higher than under the old system as cases that would have taken longer to process will be reported earlier.

A. SUMMARY This dataset shows San Francisco COVID-19 cases by population characteristics and by specimen collection date. Cases are included on the date the positive test was collected.

Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how cases have been distributed among different subgroups. This information can reveal trends and disparities among groups.

Data is lagged by five days, meaning the most recent specimen collection date included is 5 days prior to today. Tests take time to process and report, so more recent data is less reliable.

B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases and deaths are from: * Case interviews * Laboratories * Medical providers

These multiple streams of data are merged, deduplicated, and undergo data verification processes. This data may not be immediately available for recently reported cases because of the time needed to process tests and validate cases. Daily case totals on previous days may increase or decrease. Learn more.

Data are continually updated to maximize completeness of information and reporting on San Francisco residents with COVID-19.

Data notes on each population characteristic type is listed below.

Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.

Sexual orientation * Sexual orientation data is collected from individuals who are 18 years old or older. These individuals can choose whether to provide this information during case interviews. Learn more about our data collection guidelines. * The City began asking for this information on April 28, 2020.

Gender * The City collects information on gender identity using these guidelines.

Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.

Transmission type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.

Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation
* the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures.
These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.

Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing

--- Original source retains full ownership of the source dataset ---
f
Demographic Profile of Participants.pdf
figshare.com
pdf
Updated Jan 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victoria Sefah (2024). Demographic Profile of Participants.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.24953595.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24953595.v1
Dataset updated
Jan 6, 2024
Dataset provided by
figshare
Authors
Victoria Sefah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a data collected for the research topic; EXPLORING THE PHYSICAL WELL-BEING OF BREAST CANCER PATIENTS IN KUMASI METROPOLIS: A QUALITATIVE STUDY.
w
Demographic and Health Survey 2022 - Ghana
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Jan 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghana Statistical Service (GSS) (2024). Demographic and Health Survey 2022 - Ghana [Dataset]. https://microdata.worldbank.org/index.php/catalog/6122
Explore at:
Dataset updated
Jan 19, 2024
Dataset authored and provided by
Ghana Statistical Service (GSS)
Time period covered
2022 - 2023
Area covered
Ghana
Description
Abstract

The 2022 Ghana Demographic and Health Survey (2022 GDHS) is the seventh in the series of DHS surveys conducted by the Ghana Statistical Service (GSS) in collaboration with the Ministry of Health/Ghana Health Service (MoH/GHS) and other stakeholders, with funding from the United States Agency for International Development (USAID) and other partners.

The primary objective of the 2022 GDHS is to provide up-to-date estimates of basic demographic and health indicators. Specifically, the GDHS collected information on: - Fertility levels and preferences, contraceptive use, antenatal and delivery care, maternal and child health, childhood mortality, childhood immunisation, breastfeeding and young child feeding practices, women’s dietary diversity, violence against women, gender, nutritional status of adults and children, awareness regarding HIV/AIDS and other sexually transmitted infections, tobacco use, and other indicators relevant for the Sustainable Development Goals - Haemoglobin levels of women and children - Prevalence of malaria parasitaemia (rapid diagnostic testing and thick slides for malaria parasitaemia in the field and microscopy in the lab) among children age 6–59 months - Use of treated mosquito nets - Use of antimalarial drugs for treatment of fever among children under age 5

The information collected through the 2022 GDHS is intended to assist policymakers and programme managers in designing and evaluating programmes and strategies for improving the health of the country’s population.

Geographic coverage

National coverage

Analysis unit

Household

Individual

Children age 0-5

Woman age 15-49

Man age 15-59

Universe

The survey covered all de jure household members (usual residents), all women aged 15-49, men aged 15-59, and all children aged 0-4 resident in the household.

Kind of data

Sample survey data [ssd]

Sampling procedure

To achieve the objectives of the 2022 GDHS, a stratified representative sample of 18,450 households was selected in 618 clusters, which resulted in 15,014 interviewed women age 15–49 and 7,044 interviewed men age 15–59 (in one of every two households selected).

The sampling frame used for the 2022 GDHS is the updated frame prepared by the GSS based on the 2021 Population and Housing Census.1 The sampling procedure used in the 2022 GDHS was stratified two-stage cluster sampling, designed to yield representative results at the national level, for urban and rural areas, and for each of the country’s 16 regions for most DHS indicators. In the first stage, 618 target clusters were selected from the sampling frame using a probability proportional to size strategy for urban and rural areas in each region. Then the number of targeted clusters were selected with equal probability systematic random sampling of the clusters selected in the first phase for urban and rural areas. In the second stage, after selection of the clusters, a household listing and map updating operation was carried out in all of the selected clusters to develop a list of households for each cluster. This list served as a sampling frame for selection of the household sample. The GSS organized a 5-day training course on listing procedures for listers and mappers with support from ICF. The listers and mappers were organized into 25 teams consisting of one lister and one mapper per team. The teams spent 2 months completing the listing operation. In addition to listing the households, the listers collected the geographical coordinates of each household using GPS dongles provided by ICF and in accordance with the instructions in the DHS listing manual. The household listing was carried out using tablet computers, with software provided by The DHS Program. A fixed number of 30 households in each cluster were randomly selected from the list for interviews.

For further details on sample design, see APPENDIX A of the final report.

Mode of data collection

Face-to-face computer-assisted interviews [capi]

Research instrument

Four questionnaires were used in the 2022 GDHS: the Household Questionnaire, the Woman’s Questionnaire, the Man’s Questionnaire, and the Biomarker Questionnaire. The questionnaires, based on The DHS Program’s model questionnaires, were adapted to reflect the population and health issues relevant to Ghana. In addition, a self-administered Fieldworker Questionnaire collected information about the survey’s fieldworkers.

The GSS organized a questionnaire design workshop with support from ICF and obtained input from government and development partners expected to use the resulting data. The DHS Program optional modules on domestic violence, malaria, and social and behavior change communication were incorporated into the Woman’s Questionnaire. ICF provided technical assistance in adapting the modules to the questionnaires.

Cleaning operations

DHS staff installed all central office programmes, data structure checks, secondary editing, and field check tables from 17–20 October 2022. Central office training was implemented using the practice data to test the central office system and field check tables. Seven GSS staff members (four male and three female) were trained on the functionality of the central office menu, including accepting clusters from the field, data editing procedures, and producing reports to monitor fieldwork.

From 27 February to 17 March, DHS staff visited the Ghana Statistical Service office in Accra to work with the GSS central office staff on finishing the secondary editing and to clean and finalize all data received from the 618 clusters.

Response rate

A total of 18,540 households were selected for the GDHS sample, of which 18,065 were found to be occupied. Of the occupied households, 17,933 were successfully interviewed, yielding a response rate of 99%. In the interviewed households, 15,317 women age 15–49 were identified as eligible for individual interviews. Interviews were completed with 15,014 women, yielding a response rate of 98%. In the subsample of households selected for the male survey, 7,263 men age 15–59 were identified as eligible for individual interviews and 7,044 were successfully interviewed.

Sampling error estimates

The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2022 Ghana Demographic and Health Survey (2022 GDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2022 GDHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results. A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2022 GDHS sample was the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulas. The computer software used to calculate sampling errors for the GDHS 2022 is an SAS program. This program used the Taylor linearization method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

A more detailed description of estimates of sampling errors are presented in APPENDIX B of the survey report.

Data appraisal

Data Quality Tables

Age distribution of eligible and interviewed women

Age distribution of eligible and interviewed men

Age displacement at age 14/15

Age displacement at age 49/50

Pregnancy outcomes by years preceding the survey

Completeness of reporting

Standardisation exercise results from anthropometry training

Height and weight data completeness and quality for children

Height measurements from random subsample of measured children

Interference in height and weight measurements of children

Interference in height and weight measurements of women and men

Heaping in anthropometric measurements for children (digit preference)

Observation of mosquito nets

Observation of handwashing facility

School attendance by single year of age

Vaccination cards photographed

Number of
d
American Community Survey (ACS) 5-Year Estimates for Coastal...
datadiscoverystudio.org
Updated 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). American Community Survey (ACS) 5-Year Estimates for Coastal GeographiesNOAA/NMFS/EDM [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/fe9c73c054014a5f8ca489e5ddcf28c1/html
Explore at:
Dataset updated
2014
Area covered

Description
The American Community Survey (ACS) is an ongoing statistical survey that samples a small percentage of the population every year. These data have been apportioned to 13 coastal geographies, and contain detailed demographic, social, economic, and housing characteristics. They represent 5-year estimates derived from the ACS Block Group summary files. Detailed information on the ACS data can be found at the Census Bureau's American Community Survey website and in their researcher's guide entitled, 'A Compass for Understanding and Using American Community Survey Data '. Detailed information on the geographies the data are available for can be found here: https://coast.noaa.gov/data/digitalcoast/pdf/qrt-american-community-description.pdf

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Census Bureau (2023). Vintage 2018 Population Estimates: Demographic Characteristics Estimates by Age Groups [Dataset]. https://catalog.data.gov/dataset/vintage-2018-population-estimates-demographic-characteristics-estimates-by-age-groups

Vintage 2018 Population Estimates: Demographic Characteristics Estimates by Age Groups

Explore at:

Dataset updated

Jul 19, 2023

Dataset provided by

United States Census Bureauhttp://census.gov/

Description

Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2018 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/modified-race-summary-file-method/mrsf2010.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2017) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/programs-surveys/popest.html.

Clear search

Close search

Google apps

Main menu

Vintage 2018 Population Estimates: Demographic Characteristics Estimates by...

Sample data for analysis of demographic potential of the 15-minute city in...

Demographic Analysis Workflow using Census API in Jupyter Notebook:...

Data from: Using social media and personality traits to assess software...

Vintage 2014 Population Estimates: State Population Estimates by Single Year...

2010 Census Production Settings Demographic and Housing Characteristics...

Vintage 2016 Population Estimates: National Monthly Population Estimates

US County & Zipcode Historical Demographics

US County & Zipcode Historical Demographics

Data Sources

All Data Combined Source:

Population Source:

Unemployment Source:

Zip FIPS Crosswalk Source:

County Boundaries Source:

Age Sex Source:

Races Source:

[Dataset] Data for the course "Population Genomics" at Aarhus University

Vintage 2013 Population Estimates: National Monthly Population Estimates by...

Vintage 2015 Population Estimates: Demographic Characteristics Estimates by...

Provisional COVID-19 death counts, rates, and percent of total deaths, by...

Vintage 2013 Population Estimates: State Population Estimates by Single Year...

Data from: Demographic Reports

Data from: Demography, education, and research trends in the...

Vintage 2013 Population Estimates: County Population Estimates by 5 Year Age...

‘COVID-19 Cases by Population Characteristics Over Time’ analyzed by...

Demographic Profile of Participants.pdf

Demographic and Health Survey 2022 - Ghana

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data appraisal

American Community Survey (ACS) 5-Year Estimates for Coastal...

Vintage 2018 Population Estimates: Demographic Characteristics Estimates by Age GroupsSee More Versions

Vintage 2018 Population Estimates: Demographic Characteristics Estimates by Age Groups