100+ datasets found
  1. Human Resource Data Set (The Company)

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koluit (2025). Human Resource Data Set (The Company) [Dataset]. https://www.kaggle.com/datasets/koluit/human-resource-data-set-the-company
    Explore at:
    zip(401322 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    Koluit
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Similar to others who have created HR data sets, we felt that the lack of data out there for HR was limiting. It is very hard for someone to test new systems or learn People Analytics in the HR space. The only dataset most HR practitioners have is their real employee data and there are a lot of reasons why you would not want to use that when experimenting. We hope that by providing this dataset with an evergrowing variation of data points, others can learn and grow their HR data analytics and systems knowledge.

    Some example test cases where someone might use this dataset:

    HR Technology Testing and Mock-Ups Engagement survey tools HCM tools BI Tools Learning To Code For People Analytics Python/R/SQL HR Tech and People Analytics Educational Courses/Tools

    Content

    The core data CompanyData.txt has the basic demographic data about a worker. We treat this as the core data that you can join future data sets to.

    Please read the Readme.md for additional information about this along with the Changelog for additional updates as they are made.

    Acknowledgements

    Initial names, addresses, and ages were generated using FakenameGenerator.com. All additional details including Job, compensation, and additional data sets were created by the Koluit team using random generation in Excel.

    Inspiration

    Our hope is this data is used in the HR or Research space to experiment and learn using HR data. Some examples that we hope this data will be used are listed above.

    Contact Us

    Have any suggestions for additions to the data? See any issues with our data? Want to use it for your project? Please reach out to us! https://koluit.com/ ryan@koluit.com

  2. d

    Demographics

    • catalog.data.gov
    • datasets.ai
    • +4more
    Updated Nov 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lake County Illinois GIS (2024). Demographics [Dataset]. https://catalog.data.gov/dataset/demographics-0be32
    Explore at:
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Lake County Illinois GIS
    Description

    Lake County, Illinois Demographic Data. Explanation of field attributes: Total Population – The entire population of Lake County. White – Individuals who are of Caucasian race. This is a percent.African American – Individuals who are of African American race. This is a percent.Asian – Individuals who are of Asian race. This is a percent. Hispanic – Individuals who are of Hispanic ethnicity. This is a percent. Does not Speak English- Individuals who speak a language other than English in their household. This is a percent. Under 5 years of age – Individuals who are under 5 years of age. This is a percent. Under 18 years of age – Individuals who are under 18 years of age. This is a percent. 18-64 years of age – Individuals who are between 18 and 64 years of age. This is a percent. 65 years of age and older – Individuals who are 65 years old or older. This is a percent. Male – Individuals who are male in gender. This is a percent. Female – Individuals who are female in gender. This is a percent. High School Degree – Individuals who have obtained a high school degree. This is a percent. Associate Degree – Individuals who have obtained an associate degree. This is a percent. Bachelor’s Degree or Higher – Individuals who have obtained a bachelor’s degree or higher. This is a percent. Utilizes Food Stamps – Households receiving food stamps/ part of SNAP (Supplemental Nutrition Assistance Program). This is a percent. Median Household Income - A median household income refers to the income level earned by a given household where half of the homes in the area earn more and half earn less. This is a dollar amount. No High School – Individuals who have not obtained a high school degree. This is a percent. Poverty – Poverty refers to families and people whose income in the past 12 months is below the poverty level. This is a percent.

  3. ACS-ED 2013-2017 Total Population: Demographic Characteristics (DP05)

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Education Statistics (NCES) (2024). ACS-ED 2013-2017 Total Population: Demographic Characteristics (DP05) [Dataset]. https://catalog.data.gov/dataset/acs-ed-2013-2017-total-population-demographic-characteristics-dp05-7a484
    Explore at:
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    National Center for Education Statisticshttps://nces.ed.gov/
    Description

    The American Community Survey Education Tabulation (ACS-ED) is a custom tabulation of the ACS produced for the National Center of Education Statistics (NCES) by the U.S. Census Bureau. The ACS-ED provides a rich collection of social, economic, demographic, and housing characteristics for school systems, school-age children, and the parents of school-age children. In addition to focusing on school-age children, the ACS-ED provides enrollment iterations for children enrolled in public school. The data profiles include percentages (along with associated margins of error) that allow for comparison of school district-level conditions across the U.S. For more information about the NCES ACS-ED collection, visit the NCES Education Demographic and Geographic Estimates (EDGE) program at: https://nces.ed.gov/programs/edge/Demographic/ACSAnnotation values are negative value representations of estimates and have values when non-integer information needs to be represented. See the table below for a list of common Estimate/Margin of Error (E/M) values and their corresponding Annotation (EA/MA) values.All information contained in this file is in the public domain. Data users are advised to review NCES program documentation and feature class metadata to understand the limitations and appropriate use of these data.-9An '-9' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small.-8An '-8' means that the estimate is not applicable or not available.-6A '-6' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.-5A '-5' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate.-3A '-3' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate.-2A '-2' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.

  4. o

    School information and student demographics

    • data.ontario.ca
    • datasets.ai
    • +1more
    xlsx
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education (2025). School information and student demographics [Dataset]. https://data.ontario.ca/dataset/school-information-and-student-demographics
    Explore at:
    xlsx(1510697), xlsx(1529849), xlsx(1565910), xlsx(1550796), xlsx(1566878), xlsx(1565304), xlsx(1562805), xlsx(1459001), xlsx(1462006), xlsx(1460629), xlsx(1547704), xlsx(1567330), xlsx(1580734), xlsx(1462064)Available download formats
    Dataset updated
    Oct 23, 2025
    Dataset authored and provided by
    Education
    License

    https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario

    Time period covered
    Oct 23, 2025
    Area covered
    Ontario
    Description

    Data includes: board and school information, grade 3 and 6 EQAO student achievements for reading, writing and mathematics, and grade 9 mathematics EQAO and OSSLT. Data excludes private schools, Education and Community Partnership Programs (ECPP), summer, night and continuing education schools.

    How Are We Protecting Privacy?

    Results for OnSIS and Statistics Canada variables are suppressed based on school population size to better protect student privacy. In order to achieve this additional level of protection, the Ministry has used a methodology that randomly rounds a percentage either up or down depending on school enrolment. In order to protect privacy, the ministry does not publicly report on data when there are fewer than 10 individuals represented.

      * Percentages depicted as 0 may not always be 0 values as in certain situations the values have been randomly rounded down or there are no reported results at a school for the respective indicator. * Percentages depicted as 100 are not always 100, in certain situations the values have been randomly rounded up.
    The school enrolment totals have been rounded to the nearest 5 in order to better protect and maintain student privacy.

    The information in the School Information Finder is the most current available to the Ministry of Education at this time, as reported by schools, school boards, EQAO and Statistics Canada. The information is updated as frequently as possible.

    This information is also available on the Ministry of Education's School Information Finder website by individual school.

    Descriptions for some of the data types can be found in our glossary.

    School/school board and school authority contact information are updated and maintained by school boards and may not be the most current version. For the most recent information please visit: https://data.ontario.ca/dataset/ontario-public-school-contact-information.

  5. D

    ARCHIVED: COVID-19 Testing by Race/Ethnicity Over Time

    • data.sfgov.org
    • healthdata.gov
    • +1more
    csv, xlsx, xml
    Updated Jan 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Public Health - Population Health Division (2024). ARCHIVED: COVID-19 Testing by Race/Ethnicity Over Time [Dataset]. https://data.sfgov.org/Health-and-Social-Services/ARCHIVED-COVID-19-Testing-by-Race-Ethnicity-Over-T/kja3-qsky
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Jan 12, 2024
    Dataset authored and provided by
    Department of Public Health - Population Health Division
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    A. SUMMARY This dataset includes San Francisco COVID-19 tests by race/ethnicity and by date. This dataset represents the daily count of tests collected, and the breakdown of test results (positive, negative, or indeterminate). Tests in this dataset include all those collected from persons who listed San Francisco as their home address at the time of testing. It also includes tests that were collected by San Francisco providers for persons who were missing a locating address. This dataset does not include tests for residents listing a locating address outside of San Francisco, even if they were tested in San Francisco.

    The data were de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected). If a person tested multiple times on the same date, only one test is included from that date. When there are multiple tests on the same date, a positive result, if one exists, will always be selected as the record for the person. If a PCR and antigen test are taken on the same day, the PCR test will supersede. If a person tests multiple times on the same day and the results are all the same (e.g. all negative or all positive) then the first test done is selected as the record for the person.

    The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco.

    When a person gets tested for COVID-19, they may be asked to report information about themselves. One piece of information that might be requested is a person's race and ethnicity. These data are often incomplete in the laboratory and provider reports of the test results sent to the health department. The data can be missing or incomplete for several possible reasons:

    • The person was not asked about their race and ethnicity.
    • The person was asked, but refused to answer.
    • The person answered, but the testing provider did not include the person's answers in the reports.
    • The testing provider reported the person's answers in a format that could not be used by the health department.
    

    For any of these reasons, a person's race/ethnicity will be recorded in the dataset as “Unknown.”

    B. NOTE ON RACE/ETHNICITY The different values for Race/Ethnicity in this dataset are "Asian;" "Black or African American;" "Hispanic or Latino/a, all races;" "American Indian or Alaska Native;" "Native Hawaiian or Other Pacific Islander;" "White;" "Multi-racial;" "Other;" and “Unknown."

    The Race/Ethnicity categorization increases data clarity by emulating the methodology used by the U.S. Census in the American Community Survey. Specifically, persons who identify as "Asian," "Black or African American," "American Indian or Alaska Native," "Native Hawaiian or Other Pacific Islander," "White," "Multi-racial," or "Other" do NOT include any person who identified as Hispanic/Latino at any time in their testing reports that either (1) identified them as SF residents or (2) as someone who tested without a locating address by an SF provider. All persons across all races who identify as Hispanic/Latino are recorded as “"Hispanic or Latino/a, all races." This categorization increases data accuracy by correcting the way “Other” persons were counted. Previously, when a person reported “Other” for Race/Ethnicity, they would be recorded “Unknown.” Under the new categorization, they are counted as “Other” and are distinct from “Unknown.”

    If a person records their race/ethnicity as “Asian,” “Black or African American,” “American Indian or Alaska Native,” “Native Hawaiian or Other Pacific Islander,” “White,” or “Other” for their first COVID-19 test, then this data will not change—even if a different race/ethnicity is reported for this person for any future COVID-19 test. There are two exceptions to this rule. The first exception is if a person’s race/ethnicity value is reported as “Unknown” on their first test and then on a subsequent test they report “Asian;” "Black or African American;" "Hispanic or Latino/a, all races;" "American Indian or Alaska Native;" "Native Hawaiian or Other Pacific Islander;" or "White”, then this subsequent reported race/ethnicity will overwrite the previous recording of “Unknown”. If a person has only ever selected “Unknown” as their race/ethnicity, then it will be recorded as “Unknown.” This change provides more specific and actionable data on who is tested in San Francisco.

    The second exception is if a person ever marks “Hispanic or Latino/a, all races” for race/ethnicity then this choice will always overwrite any previous or future response. This is because it is an overarching category that can include any and all other races and is mutually exclusive with the other responses.

    A person's race/ethnicity will be recorded as “Multi-racial” if they select two or more values among the following choices: “Asian,” “Black or African American,” “American Indian or Alaska Native,” “Native Hawaiian or Other Pacific Islander,” “White,” or “Other.” If a person selects a combination of two or more race/ethnicity answers that includes “Hispanic or Latino/a, all races” then they will still be recorded as “Hispanic or Latino/a, all races”—not as “Multi-racial.”

    C. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information.

    D. UPDATE PROCESS Updates automatically at 5:00AM Pacific Time each day. Redundant runs are scheduled at 7:00AM and 9:00AM in case of pipeline failure.

    E. HOW TO USE THIS DATASET San Francisco population estimates for race/ethnicity can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

    Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24, 2020 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments.

    In order to track trends over time, a user can analyze this data by sorting or filtering by the "specimen_collection_date" field.

    Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of percent positive. When there are fewer than 20 positives tests for a given race/ethnicity and time period, the positivity rate is not calculated for the public tracker because rates of small test counts are less reliable.

    Calculating Testing Rates: To calculate the testing rate per 10,000 residents, divide the total number of tests collected (positive, negative, and indeterminate results) for the specified race/ethnicity by the total number of residents who identify as that race/ethnicity (according to the 2016-2020 American Community Survey (ACS) population estimate), then multiply by 10,000. When there are fewer than 20 total tests for a given race/ethnicity and time period, the testing rate is not calculated for the public tracker because rates of small test counts are less reliable.

    Read more about how this data is updated and validated daily: https://sf.gov/information/covid-19-data-questions

    F. CHANGE LOG

    • 1/12/2024 - This dataset will stop updating as of 1/12/2024
    • 6/21/2023 - A small number of additional COVID-19 testing records were released as part of our ongoing data cleaning efforts. An update to the race or ethnicity designation among a subset of testing records was simultaneously released.
    • 1/31/2023 - updated “population_estimate” column to reflect the 2020 Census Bureau American Community Survey (ACS) San Francisco Population estimates.
    • 1/31/2023 - renamed column “last_updated_at” to “data_as_of”.
    • 3/23/2022 - ‘Native American’ changed to ‘American Indian or Alaska Native’ to align with the census.
    • 2/10/2022 - race/ethnicity categorization was changed. See section NOTE ON RACE/ETHNICITY for additional information.
    • 4/16/2021 - dataset updated to refresh with a five-day data lag.

  6. California School

    • kaggle.com
    zip
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haines City (2023). California School [Dataset]. https://www.kaggle.com/datasets/hainescity/california-school
    Explore at:
    zip(28753 bytes)Available download formats
    Dataset updated
    Jun 19, 2023
    Authors
    Haines City
    Area covered
    California
    Description

    Description

    The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California.

    district: character. District code.

    school: character. School name.

    county: factor indicating county.

    grades: factor indicating grade span of district.

    students: Total enrollment.

    teachers: Number of teachers.

    calworks: Percent qualifying for CalWorks (income assistance).

    lunch: Percent qualifying for reduced-price lunch.

    computer: Number of computers.

    expenditure: Expenditure per student.

    income: District average income (in USD 1,000).

    english: Percent of English learners.

    read: Average reading score.

    Details

    The data used here are from all 420 K-6 and K-8 districts in California with data available for 1998 and 1999. Test scores are on the Stanford 9 standardized test administered to 5th grade students. School characteristics (averaged across the district) include enrollment, number of teachers (measured as “full-time equivalents”), number of computers per classroom, and expenditures per student. Demographic variables for the students are averaged across the district. The demographic variables include the percentage of students in the public assistance program CalWorks, the percentage of students that qualify for a reduced price lunch, and the percentage of students that are English learners (that is, students for whom English is a second language).

    Challenges

    Reading performance at CA’s schools Research goal: In this assignment, we aim at analysing the effect of different factors on the reading performance at Californian schools. Specifically, we will focus on education investment and students’ socio-economic environment. Your analysis should include the following steps: 1. Data-set unboxing: Perform the usual preliminary check of the data-set. 2. Closer look and setting of the key variable (ie. Reading performance): Analyse the frequency distribution of the variable. 3. Income: Family’s income is usually reported to have an influence on students’ performance in general. Can you check visually whether this is the case here? Describe and comment any possible pattern you identify. 4. Expenditure: Intuitively, we expect investment on education to affect students’ performance. So: a. Repeat the previous analysis with the variable ‘expenditure’. Describe any pattern you might identify. b. Let’s deepen on this issue. Higher investment in education can be invested in hiring teachers. So: i. Add a column to the dataset accounting for the ratio num. students / num. teachers ii. Incorporate this ratio in the figure studying the effect of expenditure. 5. English learning: Being or not an English native speaker might make a difference in our case study. Repeat the previous analysis with the variable ‘english’. Describe any pattern you might identify. 6. Correlations: a. Calculate, separately, the correlation between reading performance and family income on one side and English learning on the other. Choose the test carefully! b. According to the correlation results. What pair of variables is more closely related?

  7. d

    ARCHIVED: COVID-19 Testing by Geography Over Time

    • catalog.data.gov
    • data.sfgov.org
    • +2more
    Updated Mar 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). ARCHIVED: COVID-19 Testing by Geography Over Time [Dataset]. https://catalog.data.gov/dataset/covid-19-testing-by-geography-and-date
    Explore at:
    Dataset updated
    Mar 29, 2025
    Dataset provided by
    data.sfgov.org
    Description

    A. SUMMARY This dataset includes COVID-19 tests by resident neighborhood and specimen collection date (the day the test was collected). Specifically, this dataset includes tests of San Francisco residents who listed a San Francisco home address at the time of testing. These resident addresses were then geo-located and mapped to neighborhoods. The resident address associated with each test is hand-entered and susceptible to errors, therefore neighborhood data should be interpreted as an approximation, not a precise nor comprehensive total. In recent months, about 5% of tests are missing addresses and therefore cannot be included in any neighborhood totals. In earlier months, more tests were missing address data. Because of this high percentage of tests missing resident address data, this neighborhood testing data for March, April, and May should be interpreted with caution (see below) Percentage of tests missing address information, by month in 2020 Mar - 33.6% Apr - 25.9% May - 11.1% Jun - 7.2% Jul - 5.8% Aug - 5.4% Sep - 5.1% Oct (Oct 1-12) - 5.1% To protect the privacy of residents, the City does not disclose the number of tests in neighborhoods with resident populations of fewer than 1,000 people. These neighborhoods are omitted from the data (they include Golden Gate Park, John McLaren Park, and Lands End). Tests for residents that listed a Skilled Nursing Facility as their home address are not included in this neighborhood-level testing data. Skilled Nursing Facilities have required and repeated testing of residents, which would change neighborhood trends and not reflect the broader neighborhood's testing data. This data was de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected). The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco. During this investigation, some test results are found to be for persons living outside of San Francisco and some people in San Francisco may be tested multiple times (which is common). To see the number of new confirmed cases by neighborhood, reference this map: https://sf.gov/data/covid-19-case-maps#new-cases-maps B. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information. All testing data is then geo-coded by resident address. Then data is aggregated by analysis neighborhood and specimen collection date. Data are prepared by close of business Monday through Saturday for public display. C. UPDATE PROCESS Updates automatically at 05:00 Pacific Time each day. Redundant runs are scheduled at 07:00 and 09:00 in case of pipeline failure. D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS). Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments. In order to track trends over time, a data user can analyze this data by "specimen_collection_date". Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of pe

  8. n

    Data and code for: Generation and applications of simulated datasets to...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Mar 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Silk; Olivier Gimenez (2023). Data and code for: Generation and applications of simulated datasets to integrate social network and demographic analyses [Dataset]. http://doi.org/10.5061/dryad.m0cfxpp7s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2023
    Dataset provided by
    Centre d'Écologie Fonctionnelle et Évolutive
    Authors
    Matthew Silk; Olivier Gimenez
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.

  9. The NIMH Healthy Research Volunteer Dataset

    • openneuro.org
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison C. Nugent; Adam G Thomas; Margaret Mahoney; Alison Gibbons; Jarrod Smith; Antoinette Charles; Jacob S Shaw; Jeffrey D Stout; Anna M Namyst; Arshitha Basavaraj; Eric Earl; Dustin Moraczewski; Emily Guinee; Michael Liu; Travis Riddle; Joseph Snow; Shruti Japee; Morgan Andrews; Adriana Pavletic; Stephen Sinclair; Vinai Roopchansingh; Peter A Bandettini; Joyce Chung (2024). The NIMH Healthy Research Volunteer Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds004215.v2.0.1
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Allison C. Nugent; Adam G Thomas; Margaret Mahoney; Alison Gibbons; Jarrod Smith; Antoinette Charles; Jacob S Shaw; Jeffrey D Stout; Anna M Namyst; Arshitha Basavaraj; Eric Earl; Dustin Moraczewski; Emily Guinee; Michael Liu; Travis Riddle; Joseph Snow; Shruti Japee; Morgan Andrews; Adriana Pavletic; Stephen Sinclair; Vinai Roopchansingh; Peter A Bandettini; Joyce Chung
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The National Institute of Mental Health (NIMH) Research Volunteer (RV) Data Set

    A comprehensive dataset characterizing healthy research volunteers in terms of clinical assessments, mood-related psychometrics, cognitive function neuropsychological tests, structural and functional magnetic resonance imaging (MRI), along with diffusion tensor imaging (DTI), and a comprehensive magnetoencephalography battery (MEG).

    In addition, blood samples are currently banked for future genetic analysis. All data collected in this protocol are broadly shared in the OpenNeuro repository, in the Brain Imaging Data Structure (BIDS) format. In addition, task paradigms and basic pre-processing scripts are shared on GitHub. This dataset is unprecedented in its depth of characterization of a healthy population and will allow a wide array of investigations into normal cognition and mood regulation.

    This dataset is licensed under the Creative Commons Zero (CC0) v1.0 License.

    Release Notes

    Release v2.0.0

    This release includes data collected between 2020-06-03 (cut-off date for v1.0.0) and 2024-04-01. Notable changes in this release:

    1. 769 new participants have been added along with re-evaluation data for 15 participants. Total unique participants count is now 1859.
    2. visit and age_at_visit columns added to phenotype files to distinguish between visits and intervals between them.
    3. Follow-up online survey data included.
    4. Replaced Beck Anxiety Inventory (BAI) and Beck Depression Inventory-II (BDI-II) with General Anxiety Disorder-7 (GAD7) and Patient Health Questionnaire 9 (PHQ9) surveys, respectively.
    5. Discontinued the Perceived Health rating survey.
    6. Added Brief Trauma Questionnaire (BTQ) and Big Five personality survey to online screening questionnaires.
    7. MRI:
      • Replaced ADNI-3 resting state sequence with a multi-echo sequence with higher spatial resolution.
      • Replaced field map scans with a shorter reversed-blipped EPI scan.
    8. MEG:
      • Some participants have 6-minute empty room data instead of the shorter duration empty room acquisition.

    See the CHANGES file for complete version-wise changelog.

    Participant Eligibility

    To be eligible for the study, participants need to be medically healthy adults over 18 years of age with the ability to read, speak and understand English. All participants provided electronic informed consent for online pre-screening, and written informed consent for all other procedures. Participants with a history of mental illness or suicidal or self-injury thoughts or behavior are excluded. Additional exclusion criteria include current illicit drug use, abnormal medical exam, and less than an 8th grade education or IQ below 70. Current NIMH employees, or first degree relatives of NIMH employees are prohibited from participating. Study participants are recruited through direct mailings, bulletin boards and listservs, outreach exhibits, print advertisements, and electronic media.

    Clinical Measures

    All potential volunteers visit the study website, check a box indicating consent, and fill out preliminary screening questionnaires. The questionnaires include basic demographics, the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0), the DSM-5 Self-Rated Level 1 Cross-Cutting Symptom Measure, the DSM-5 Level 2 Cross-Cutting Symptom Measure - Substance Use, the Alcohol Use Disorders Identification Test (AUDIT), the Edinburgh Handedness Inventory, and a brief clinical history checklist. The WHODAS 2.0 is a 15 item questionnaire that assesses overall general health and disability, with 14 items distributed over 6 domains: cognition, mobility, self-care, “getting along”, life activities, and participation. The DSM-5 Level 1 cross-cutting measure uses 23 items to assess symptoms across diagnoses, although an item regarding self-injurious behavior was removed from the online self-report version. The DSM-5 Level 2 cross-cutting measure is adapted from the NIDA ASSIST measure, and contains 15 items to assess use of both illicit drugs and prescription drugs without a doctor’s prescription. The AUDIT is a 10 item screening assessment used to detect harmful levels of alcohol consumption, and the Edinburgh Handedness Inventory is a systematic assessment of handedness. These online results do not contain any personally identifiable information (PII). At the conclusion of the questionnaires, participants are prompted to send an email to the study team. These results are reviewed by the study team, who determines if the participant is appropriate for an in-person interview.

    Participants who meet all inclusion criteria are scheduled for an in-person screening visit to determine if there are any further exclusions to participation. At this visit, participants receive a History and Physical exam, Structured Clinical Interview for DSM-5 Disorders (SCID-5), the Beck Depression Inventory-II (BDI-II), Beck Anxiety Inventory (BAI), and the Kaufman Brief Intelligence Test, Second Edition (KBIT-2). The purpose of these cognitive and psychometric tests is two-fold. First, these measures are designed to provide a sensitive test of psychopathology. Second, they provide a comprehensive picture of cognitive functioning, including mood regulation. The SCID-5 is a structured interview, administered by a clinician, that establishes the absence of any DSM-5 axis I disorder. The KBIT-2 is a brief (20 minute) assessment of intellectual functioning administered by a trained examiner. There are three subtests, including verbal knowledge, riddles, and matrices.

    Biological and physiological measures

    Biological and physiological measures are acquired, including blood pressure, pulse, weight, height, and BMI. Blood and urine samples are taken and a complete blood count, acute care panel, hepatic panel, thyroid stimulating hormone, viral markers (HCV, HBV, HIV), c-reactive protein, creatine kinase, urine drug screen and urine pregnancy tests are performed. In addition, three additional tubes of blood samples are collected and banked for future analysis, including genetic testing.

    Imaging Studies

    Participants were given the option to enroll in optional magnetic resonance imaging (MRI) and magnetoencephalography (MEG) studies.

    MRI

    On the same visit as the MRI scan, participants are administered a subset of tasks from the NIH Toolbox Cognition Battery. The four tasks asses attention and executive functioning (Flanker Inhibitory Control and Attention Task), executive functioning (Dimensional Change Card Sort Task), episodic memory (Picture Sequence Memory Task), and working memory (List Sorting Working Memory Task). The MRI protocol used was initially based on the ADNI-3 basic protocol, but was later modified to include portions of the ABCD protocol in the following manner:

    1. The T1 scan from ADNI3 was replaced by the T1 scan from the ABCD protocol.
    2. The Axial T2 2D FLAIR acquisition from ADNI2 was added, and fat saturation turned on.
    3. Fat saturation was turned on for the pCASL acquisition.
    4. The high-resolution in-plane hippocampal 2D T2 scan was removed, and replaced with the whole brain 3D T2 scan from the ABCD protocol (which is resolution and bandwidth matched to the T1 scan).
    5. The slice-select gradient reversal method was turned on for DTI acquisition, and reconstruction interpolation turned off.
    6. Scans for distortion correction were added (reversed-blip scans for DTI and resting state scans).
    7. The 3D FLAIR sequence was made optional, and replaced by one where the prescription and other acquisition parameters provide resolution and geometric correspondence between the T1 and T2 scans.

    MEG

    The optional MEG studies were added to the protocol approximately one year after the study was initiated, thus there are relatively fewer MEG recordings in comparison to the MRI dataset. MEG studies are performed on a 275 channel CTF MEG system. The position of the head was localized at the beginning and end of the recording using three fiducial coils. These coils were placed 1.5 cm above the nasion, and at each ear, 1.5 cm from the tragus on a line between the tragus and the outer canthus of the eye. For some participants, photographs were taken of the three coils and used to mark the points on the T1 weighted structural MRI scan for co-registration. For the remainder of the participants, a BrainSight neuro-navigation unit was used to coregister the MRI, anatomical fiducials, and localizer coils directly prior to MEG data acquisition.

    Specific Survey and Test Data within Data Set

    NOTE: In the release 2.0 of the dataset, two measures Brief Trauma Questionnaire (BTQ) and Big Five personality survey were added to the online screening questionnaires. Also, for the in-person screening visit, the Beck Anxiety Inventory (BAI) and Beck Depression Inventory-II (BDI-II) were replaced with the General Anxiety Disorder-7 (GAD7) and Patient Health Questionnaire 9 (PHQ9) surveys, respectively. The Perceived Health rating survey was discontinued.

    1. Preliminary Online Screening Questionnaires

    Survey or TestBIDS TSV Name
    Alcohol Use Disorders Identification Test (AUDIT)audit.tsv
    Brief Trauma Questionnaire (BTQ)btq.tsv
    Big-Five Personalitybig_five_personality.tsv
    Demographicsdemographics.tsv
    Drug Use Questionnaire
  10. o

    Covid-19-Case-Surveillance-Public-Use-Dataset

    • openml.org
    • opendatalab.com
    • +7more
    Updated Mar 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Covid-19-Case-Surveillance-Public-Use-Dataset [Dataset]. https://www.openml.org/d/43365
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 23, 2022
    License
    Description

    Context and Content The COVID-19 case surveillance system database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and states. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as immediately notifiable, urgent (within 24 hours) by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and shared voluntarily with CDC. For more information: https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf The deidentified data in the public use dataset include demographic characteristics, exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and comorbidities. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

    Acknowledgement https://www.cdc.gov/

    Inspiration

    Covid-19 researches e.g. Demographic Trends of COVID-19 cases and deaths

  11. a

    Selected Demographic and Housing Estimates (DP05)

    • data-seattlecitygis.opendata.arcgis.com
    • data.seattle.gov
    • +1more
    Updated Aug 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2023). Selected Demographic and Housing Estimates (DP05) [Dataset]. https://data-seattlecitygis.opendata.arcgis.com/datasets/SeattleCityGIS::selected-demographic-and-housing-estimates-dp05
    Explore at:
    Dataset updated
    Aug 11, 2023
    Dataset authored and provided by
    City of Seattle ArcGIS Online
    Description

    Data from: American Community Survey, 5-year SeriesKing County, Washington census tracts with nonoverlapping vintages of the 5-year American Community Survey (ACS) estimates starting in 2010 from the U.S. Census Bureau's demographic and housing estimates (DP05). Also includes the most recent release annually with the vintage identified in the "ACS Vintage" field.The census tract boundaries match the vintage of the ACS data (currently 2010 and 2020) so please note the geographic changes between the decades. Tracts have been coded as being within the City of Seattle as well as assigned to neighborhood groups called "Community Reporting Areas". These areas were created after the 2000 census to provide geographically consistent neighborhoods through time for reporting U.S. Census Bureau data. This is not an attempt to identify neighborhood boundaries as defined by neighborhoods themselves.Vintages: 2010, 2015, 2020, 2021, 2022, 2023ACS Table(s): DP05Data downloaded from: Census Bureau's Explore Census Data The United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.

  12. COVID-19 Case Surveillance Restricted Access Detailed Data

    • data.cdc.gov
    • data.virginia.gov
    • +2more
    csv, xlsx, xml
    Updated Nov 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDC Data, Analytics and Visualization Task Force (2020). COVID-19 Case Surveillance Restricted Access Detailed Data [Dataset]. https://data.cdc.gov/w/mbd7-r32t/tdwk-ruhb?cur=IxdiCK2HbvP&from=fR_yXV11V1R
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Nov 20, 2020
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    CDC Data, Analytics and Visualization Task Force
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

    Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

    This case surveillance publicly available dataset has 33 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. This dataset requires a registration process and a data use agreement.

    CDC has three COVID-19 case surveillance datasets:

    Requesting Access to the COVID-19 Case Surveillance Restricted Access Detailed Data Please review the following documents to determine your interest in accessing the COVID-19 Case Surveillance Restricted Access Detailed Data file: 1) CDC COVID-19 Case Surveillance Restricted Access Detailed Data: Summary, Guidance, Limitations Information, and Restricted Access Data Use Agreement Information 2) Data Dictionary for the COVID-19 Case Surveillance Restricted Access Detailed Data The next step is to complete the Registration Information and Data Use Restrictions Agreement (RIDURA). Once complete, CDC will review your agreement. After access is granted, Ask SRRG (eocevent394@cdc.gov) will email you information about how to access the data through GitHub. If you have questions about obtaining access, email eocevent394@cdc.gov.

    Overview

    The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

    COVID-19 case surveillance data are collected by jurisdictions and are shared voluntarily with CDC. For more information, visit: https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.

    The deidentified data in the restricted access dataset include demographic characteristics, state and county of residence, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and comorbidities.

    All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

    COVID-19 case reports have been routinely submitted using standardized case reporting forms.

    On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

    CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification. All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for lab-confirmed or probable cases.

    On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

    Data are Considered Provisional

    • The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
    • Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.

    Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

    Data Limitations

    To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

    Data Quality Assurance Procedures

    CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:

    • Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question "Was the individual hospitalized?" where the possible answer choices include "Yes," "No," or "Unknown," the blank value is recoded to "Missing" because the case report form did not include a response to the question.
    • Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
    • Additional data quality processing to recode free text data is ongoing. Data on symptoms, race, ethnicity, and healthcare worker status have been prioritized.

    Data Suppression

    To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<11 COVID-19 case records with a given values). Suppression includes low frequency combinations of case month, geographic characteristics (county and state of residence), and demographic characteristics (sex, age group, race, and ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

    Additional COVID-19 Data

    COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These and other COVID-19 data are available from multiple public locations:

  13. MRT GUI Software

    • data.nist.gov
    • s.cnmilf.com
    • +1more
    Updated Oct 14, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alison Kahn (2020). MRT GUI Software [Dataset]. http://doi.org/10.18434/mds2-2310
    Explore at:
    Dataset updated
    Oct 14, 2020
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Alison Kahn
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    Modified Rhyme Test (MRT) GUI. Software to run MRTs and collect intelligibility data. Software consists of a simple graphical interface. Test consists of collecting basic demographic information from a user, playing MRT phrases with different distortions, and recording the user response.

  14. Demographic and Health Survey 2022 - Ghana

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghana Statistical Service (GSS) (2024). Demographic and Health Survey 2022 - Ghana [Dataset]. https://microdata.worldbank.org/index.php/catalog/6122
    Explore at:
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    Ghana Statistical Services
    Authors
    Ghana Statistical Service (GSS)
    Time period covered
    2022 - 2023
    Area covered
    Ghana
    Description

    Abstract

    The 2022 Ghana Demographic and Health Survey (2022 GDHS) is the seventh in the series of DHS surveys conducted by the Ghana Statistical Service (GSS) in collaboration with the Ministry of Health/Ghana Health Service (MoH/GHS) and other stakeholders, with funding from the United States Agency for International Development (USAID) and other partners.

    The primary objective of the 2022 GDHS is to provide up-to-date estimates of basic demographic and health indicators. Specifically, the GDHS collected information on: - Fertility levels and preferences, contraceptive use, antenatal and delivery care, maternal and child health, childhood mortality, childhood immunisation, breastfeeding and young child feeding practices, women’s dietary diversity, violence against women, gender, nutritional status of adults and children, awareness regarding HIV/AIDS and other sexually transmitted infections, tobacco use, and other indicators relevant for the Sustainable Development Goals - Haemoglobin levels of women and children - Prevalence of malaria parasitaemia (rapid diagnostic testing and thick slides for malaria parasitaemia in the field and microscopy in the lab) among children age 6–59 months - Use of treated mosquito nets - Use of antimalarial drugs for treatment of fever among children under age 5

    The information collected through the 2022 GDHS is intended to assist policymakers and programme managers in designing and evaluating programmes and strategies for improving the health of the country’s population.

    Geographic coverage

    National coverage

    Analysis unit

    • Household
    • Individual
    • Children age 0-5
    • Woman age 15-49
    • Man age 15-59

    Universe

    The survey covered all de jure household members (usual residents), all women aged 15-49, men aged 15-59, and all children aged 0-4 resident in the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    To achieve the objectives of the 2022 GDHS, a stratified representative sample of 18,450 households was selected in 618 clusters, which resulted in 15,014 interviewed women age 15–49 and 7,044 interviewed men age 15–59 (in one of every two households selected).

    The sampling frame used for the 2022 GDHS is the updated frame prepared by the GSS based on the 2021 Population and Housing Census.1 The sampling procedure used in the 2022 GDHS was stratified two-stage cluster sampling, designed to yield representative results at the national level, for urban and rural areas, and for each of the country’s 16 regions for most DHS indicators. In the first stage, 618 target clusters were selected from the sampling frame using a probability proportional to size strategy for urban and rural areas in each region. Then the number of targeted clusters were selected with equal probability systematic random sampling of the clusters selected in the first phase for urban and rural areas. In the second stage, after selection of the clusters, a household listing and map updating operation was carried out in all of the selected clusters to develop a list of households for each cluster. This list served as a sampling frame for selection of the household sample. The GSS organized a 5-day training course on listing procedures for listers and mappers with support from ICF. The listers and mappers were organized into 25 teams consisting of one lister and one mapper per team. The teams spent 2 months completing the listing operation. In addition to listing the households, the listers collected the geographical coordinates of each household using GPS dongles provided by ICF and in accordance with the instructions in the DHS listing manual. The household listing was carried out using tablet computers, with software provided by The DHS Program. A fixed number of 30 households in each cluster were randomly selected from the list for interviews.

    For further details on sample design, see APPENDIX A of the final report.

    Mode of data collection

    Face-to-face computer-assisted interviews [capi]

    Research instrument

    Four questionnaires were used in the 2022 GDHS: the Household Questionnaire, the Woman’s Questionnaire, the Man’s Questionnaire, and the Biomarker Questionnaire. The questionnaires, based on The DHS Program’s model questionnaires, were adapted to reflect the population and health issues relevant to Ghana. In addition, a self-administered Fieldworker Questionnaire collected information about the survey’s fieldworkers.

    The GSS organized a questionnaire design workshop with support from ICF and obtained input from government and development partners expected to use the resulting data. The DHS Program optional modules on domestic violence, malaria, and social and behavior change communication were incorporated into the Woman’s Questionnaire. ICF provided technical assistance in adapting the modules to the questionnaires.

    Cleaning operations

    DHS staff installed all central office programmes, data structure checks, secondary editing, and field check tables from 17–20 October 2022. Central office training was implemented using the practice data to test the central office system and field check tables. Seven GSS staff members (four male and three female) were trained on the functionality of the central office menu, including accepting clusters from the field, data editing procedures, and producing reports to monitor fieldwork.

    From 27 February to 17 March, DHS staff visited the Ghana Statistical Service office in Accra to work with the GSS central office staff on finishing the secondary editing and to clean and finalize all data received from the 618 clusters.

    Response rate

    A total of 18,540 households were selected for the GDHS sample, of which 18,065 were found to be occupied. Of the occupied households, 17,933 were successfully interviewed, yielding a response rate of 99%. In the interviewed households, 15,317 women age 15–49 were identified as eligible for individual interviews. Interviews were completed with 15,014 women, yielding a response rate of 98%. In the subsample of households selected for the male survey, 7,263 men age 15–59 were identified as eligible for individual interviews and 7,044 were successfully interviewed.

    Sampling error estimates

    The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2022 Ghana Demographic and Health Survey (2022 GDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2022 GDHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results. A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.

    If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2022 GDHS sample was the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulas. The computer software used to calculate sampling errors for the GDHS 2022 is an SAS program. This program used the Taylor linearization method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

    A more detailed description of estimates of sampling errors are presented in APPENDIX B of the survey report.

    Data appraisal

    Data Quality Tables

    • Age distribution of eligible and interviewed women
    • Age distribution of eligible and interviewed men
    • Age displacement at age 14/15
    • Age displacement at age 49/50
    • Pregnancy outcomes by years preceding the survey
    • Completeness of reporting
    • Standardisation exercise results from anthropometry training
    • Height and weight data completeness and quality for children
    • Height measurements from random subsample of measured children
    • Interference in height and weight measurements of children
    • Interference in height and weight measurements of women and men
    • Heaping in anthropometric measurements for children (digit preference)
    • Observation of mosquito nets
    • Observation of handwashing facility
    • School attendance by single year of age
    • Vaccination cards photographed
    • Number of
  15. f

    Data from: S1 Dataset -

    • figshare.com
    txt
    Updated Feb 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Talia R. Cohen; Gaylen E. Fronk; Kent A. Kiehl; John J. Curtin; Michael Koenigs (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0297448.s004
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Talia R. Cohen; Gaylen E. Fronk; Kent A. Kiehl; John J. Curtin; Michael Koenigs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveThere is currently inconclusive evidence regarding the relationship between recidivism and mental illness. This retrospective study aimed to use rigorous machine learning methods to understand the unique predictive utility of mental illness for recidivism in a general population (i.e.; not only those with mental illness) prison sample in the United States.MethodParticipants were adult men (n = 322) and women (n = 72) who were recruited from three prisons in the Midwest region of the United States. Three model comparisons using Bayesian correlated t-tests were conducted to understand the incremental predictive utility of mental illness, substance use, and crime and demographic variables for recidivism prediction. Three classification statistical algorithms were considered while evaluating model configurations for the t-tests: elastic net logistic regression (GLMnet), k-nearest neighbors (KNN), and random forests (RF).ResultsRates of substance use disorders were particularly high in our sample (86.29%). Mental illness variables and substance use variables did not add predictive utility for recidivism prediction over and above crime and demographic variables. Exploratory analyses comparing the crime and demographic, substance use, and mental illness feature sets to null models found that only the crime and demographics model had an increased likelihood of improving recidivism prediction accuracy.ConclusionsDespite not finding a direct relationship between mental illness and recidivism, treatment of mental illness in incarcerated populations is still essential due to the high rates of mental illnesses, the legal imperative, the possibility of decreasing institutional disciplinary burden, the opportunity to increase the effectiveness of rehabilitation programs in prison, and the potential to improve meaningful outcomes beyond recidivism following release.

  16. Magic, Memory, and Curiosity (MMC) fMRI Dataset

    • openneuro.org
    Updated May 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefanie Meliss; Cristina Pascua-Martin; Jeremy Skipper; Kou Murayama (2023). Magic, Memory, and Curiosity (MMC) fMRI Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds004182.v1.0.1
    Explore at:
    Dataset updated
    May 1, 2023
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Stefanie Meliss; Cristina Pascua-Martin; Jeremy Skipper; Kou Murayama
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    • The Magic, Memory, Curiosity (MMC) dataset contains data from 50 healthy human adults incidentally encoding 36 videos of magic tricks inside the MRI scanner across three runs.
    • Before and after incidental learning, a 10-min resting-state scan was acquired.
    • The MMC dataset includes contextual incentive manipulation, curiosity ratings for the magic tricks, as well as incidental memory performance tested a week later using a surprise cued recall and recognition test .
    • Working memory and constructs potentially relevant in the context of motivated learning (e.g., need for cognition, fear of failure) were additionally assessed.

    Stimuli

    The stimuli used here were short videos of magic tricks taken from a validated stimulus set (MagicCATs, Ozono et al., 2021) specifically created for the usage in fMRI studies. All final stimuli are available upon request. The request procedure is outlined in the Open Science Framework repository associated with the MagicCATs stimulus set (https://osf.io/ad6uc/).

    Participant responses

    Participants’ responses to demographic questions, questionnaires, and performance in the working memory assessment as well as both tasks are available in comma-separated value (CSV) files. Demographic (MMC_demographics.csv), raw questionnaire (MMC_raw_quest_data.csv) and other score data (MMC_scores.csv) as well as other information (MMC_other_information.csv) are structured as one line per participant with questions and/or scores as columns. Explicit wordings and naming of variables can be found in the supplementary information. Participant scan summaries (MMC_scan_subj_sum.csv) contain descriptives of brain coverage, TSNR, and framewise displacement (one row per participant) averaged first within acquisitions and then within participants. Participants’ responses and reaction times in the magic trick watching and memory task (MMC_experimental_data.csv) are stored as one row per trial per participant.

    Preprocessing

    Data was preprocessed using the AFNI (version 21.2.03) software suite. As a first step, the EPI timeseries were distortion-corrected along the encoding axis (P>>A) using the phase difference map (‘epi_b0_correct.py’). The resulting distortion-corrected EPIs were then processed separately for each task, but scans from the same task were processed together. The same blocks were applied to both task and resting-state distortion-corrected EPI data using afni_proc.py (see below): despiking, slice-timing and head-motion correction, intrasubject alignment between anatomy and EPI, intersubject registration to MNI, masking, smoothing, scaling, and denoising. For more details, please refer to the data descriptor (LINK) or the Github repository (https://github.com/stefaniemeliss/MMC_dataset).

    afni_proc.py -subj_id "${subjstr}" \
      -blocks despike tshift align tlrc volreg mask blur scale regress \
      -radial_correlate_blocks tcat volreg \
      -copy_anat $derivindir/$anatSS \
      -anat_has_skull no \
      -anat_follower anat_w_skull anat $derivindir/$anatUAC \
      -anat_follower_ROI aaseg anat $sswindir/$fsparc \
      -anat_follower_ROI aeseg epi $sswindir/$fsparc \
      -anat_follower_ROI FSvent epi $sswindir/$fsvent \
      -anat_follower_ROI FSWMe epi $sswindir/$fswm \
      -anat_follower_ROI FSGMe epi $sswindir/$fsgm \
      -anat_follower_erode FSvent FSWMe \
      -dsets $epi_dpattern \
      -outlier_polort $POLORT \
      -tcat_remove_first_trs 0 \
      -tshift_opts_ts -tpattern altplus \
      -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \
      -align_epi_strip_method 3dSkullStrip \
      -tlrc_base MNI152_2009_template_SSW.nii.gz \
      -tlrc_NL_warp \
      -tlrc_NL_warped_dsets $sswindir/$anatQQ $sswindir/$matrix $sswindir/$warp \
      -volreg_base_ind 1 $min_out_first_run \
      -volreg_post_vr_allin yes \
      -volreg_pvra_base_index MIN_OUTLIER \
      -volreg_align_e2a \
      -volreg_tlrc_warp \
      -volreg_no_extent_mask \
      -mask_dilate 8 \
      -mask_epi_anat yes \
      -blur_to_fwhm -blur_size 8 \
      -regress_motion_per_run \
      -regress_ROI_PC FSvent 3 \
      -regress_ROI_PC_per_run FSvent \
      -regress_make_corr_vols aeseg FSvent \
      -regress_anaticor_fast \
      -regress_anaticor_label FSWMe \
      -regress_censor_motion 0.3 \
      -regress_censor_outliers 0.1 \
      -regress_apply_mot_types demean deriv \
      -regress_est_blur_epits \
      -regress_est_blur_errts \
      -regress_run_clustsim no \
      -regress_polort 2 \
      -regress_bandpass 0.01 1 \
      -html_review_style pythonic
    

    Derivatives

    The anat folder contains derivatives associated with the anatomical scan. The skull-stripped image created using @SSwarper is available in original and ICBM 2009c Nonlinear Asymmetric Template space as sub-[group][ID]_space-[space]_desc-skullstripped_T1w.nii.gz together with the corresponding affine matrix (sub-[group][ID]_aff12.1D) and incremental warp (sub-[group][ID]_warp.nii.gz). Output generated using @SUMA_Make_Spec_FS (defaced anatomical image, whole brain and tissue masks, as well as FreeSurfer discrete segmentations based on the Desikan-Killiany cortical atlas and the Destrieux cortical atlas) are also available as sub-[group][ID]_space-orig_desc-surfvol_T1w.nii.gz, sub-[group][ID]_space-orig_label-[label]_mask.nii.gz, and sub-[group][ID]_space-orig_desc-[atlas]_dseg.nii.gz, respectively.

    The func folder contains derivatives associated with the functional scans. To enhance re-usability, the fully preprocessed and denoised files are shared as sub-[group][ID]_task-[task]_desc-fullpreproc_bold.nii.gz. Additionally, partially preprocessed files (distortion corrected, despiked, slice-timing/head-motion corrected, aligned to anatomy and template space) are uploaded as sub-[group][ID]_task-[task]_run-[1-3]_desc-MNIaligned_bold.nii.gz together with slightly dilated brain mask in EPI resolution and template space where white matter and lateral ventricle were removed (sub-[group][ID]_task-[task]_space-MNI152NLin2009cAsym_label-dilatedGM_mask.nii.gz) as well as tissue masks in EPI resolution and template space (sub-[group][ID]_task-[task]_space-MNI152NLin2009cAsym_label-[tissue]_mask.nii.gz).

    The regressors folder contains nuisance regressors stemming from the output of the full afni_proc.py preprocessing pipeline. They are provided as space-delimited text values where each row represents one volume concatenated across all runs for each task separately. Those estimates that are provided per run contain the data for the volumes of one run and zeros for the volumes of other runs. This allows them to be regressed out separately for each run. The motion estimates show rotation (degree counterclockwise) in roll, pitch, and yaw and displacement (mm) in superior, left, and posterior direction. In addition to the motion parameters with respect to the base volume (sub-[group][ID]_task-[task]_label-mot_regressor.1D), motion derivatives (sub-[group][ID]_task-[task]_run[1-3]_label-motderiv_regressor.1D) and demeaned motion parameters (sub-[group][ID]_task-[task]_run[1-3]_label-motdemean_regressor.1D) are also available for each run separately. The sub-[group][ID]_task-[task]_run[1-3]_label-ventriclePC_regressor.1D files contain time course of the first three PCs of the lateral ventricle per run. Additionally, outlier fractions for each volume are provided (sub-[group][ID]_task-[task]_label-outlierfrac_regressor.1D) and sub-[group][ID]_task-[task]_label-censorTRs_regressor.1D shows which volumes were censored because motion or outlier fraction exceeded the limits specified. The voxelwise time course of local WM regressors created using fast ANATICOR is shared as sub-[group][ID]_task-[task]_label-localWM_regressor.nii.gz.

  17. Demographic and Health Survey 2018 - Nigeria

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Nov 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Population Commission (NPC) (2019). Demographic and Health Survey 2018 - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/3540
    Explore at:
    Dataset updated
    Nov 12, 2019
    Dataset provided by
    National Population Commissionhttps://nationalpopulation.gov.ng/
    Authors
    National Population Commission (NPC)
    Time period covered
    2018
    Area covered
    Nigeria
    Description

    Abstract

    The primary objective of the 2018 NDHS is to provide up-to-date estimates of basic demographic and health indicators. Specifically, the NDHS collected information on fertility, awareness and use of family planning methods, breastfeeding practices, nutritional status of women and children, maternal and child health, adult and childhood mortality, women’s empowerment, domestic violence, female genital cutting, prevalence of malaria, awareness and behaviour regarding HIV/AIDS and other sexually transmitted infections (STIs), disability, and other health-related issues such as smoking.

    The information collected through the 2018 NDHS is intended to assist policymakers and programme managers in evaluating and designing programmes and strategies for improving the health of the country’s population. The 2018 NDHS also provides indicators relevant to the Sustainable Development Goals (SDGs) for Nigeria.

    Geographic coverage

    National coverage

    Analysis unit

    • Household
    • Individual
    • Children age 0-5
    • Woman age 15-49
    • Man age 15-49

    Universe

    The survey covered all de jure household members (usual residents), all women aged 15-49 years resident in the household, and all children aged 0-5 years resident in the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sampling frame used for the 2018 NDHS is the Population and Housing Census of the Federal Republic of Nigeria (NPHC), which was conducted in 2006 by the National Population Commission. Administratively, Nigeria is divided into states. Each state is subdivided into local government areas (LGAs), and each LGA is divided into wards. In addition to these administrative units, during the 2006 NPHC each locality was subdivided into convenient areas called census enumeration areas (EAs). The primary sampling unit (PSU), referred to as a cluster for the 2018 NDHS, is defined on the basis of EAs from the 2006 EA census frame. Although the 2006 NPHC did not provide the number of households and population for each EA, population estimates were published for 774 LGAs. A combination of information from cartographic material demarcating each EA and the LGA population estimates from the census was used to identify the list of EAs, estimate the number of households, and distinguish EAs as urban or rural for the survey sample frame. Before sample selection, all localities were classified separately into urban and rural areas based on predetermined minimum sizes of urban areas (cut-off points); consistent with the official definition in 2017, any locality with more than a minimum population size of 20,000 was classified as urban.

    The sample for the 2018 NDHS was a stratified sample selected in two stages. Stratification was achieved by separating each of the 36 states and the Federal Capital Territory into urban and rural areas. In total, 74 sampling strata were identified. Samples were selected independently in every stratum via a two-stage selection. Implicit stratifications were achieved at each of the lower administrative levels by sorting the sampling frame before sample selection according to administrative order and by using a probability proportional to size selection during the first sampling stage.

    For further details on sample selection, see Appendix A of the final report.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    Four questionnaires were used for the 2018 NDHS: the Household Questionnaire, the Woman’s Questionnaire, the Man’s Questionnaire, and the Biomarker Questionnaire. The questionnaires, based on The DHS Program’s standard Demographic and Health Survey (DHS-7) questionnaires, were adapted to reflect the population and health issues relevant to Nigeria. Comments were solicited from various stakeholders representing government ministries and agencies, nongovernmental organisations, and international donors. In addition, information about the fieldworkers for the survey was collected through a self-administered Fieldworker Questionnaire.

    Cleaning operations

    The processing of the 2018 NDHS data began almost immediately after the fieldwork started. As data collection was completed in each cluster, all electronic data files were transferred via the IFSS to the NPC central office in Abuja. These data files were registered and checked for inconsistencies, incompleteness, and outliers. The field teams were alerted to any inconsistencies and errors. Secondary editing, carried out in the central office, involved resolving inconsistencies and coding the open-ended questions. The NPC data processor coordinated the exercise at the central office. The biomarker paper questionnaires were compared with electronic data files to check for any inconsistencies in data entry. Data entry and editing were carried out using the CSPro software package. The concurrent processing of the data offered a distinct advantage because it maximised the likelihood of the data being error-free and accurate. Timely generation of field check tables allowed for effective monitoring. The secondary editing of the data was completed in the second week of April 2019.

    Response rate

    A total of 41,668 households were selected for the sample, of which 40,666 were occupied. Of the occupied households, 40,427 were successfully interviewed, yielding a response rate of 99%. In the households interviewed, 42,121 women age 15-49 were identified for individual interviews; interviews were completed with 41,821 women, yielding a response rate of 99%. In the subsample of households selected for the male survey, 13,422 men age 15-59 were identified and 13,311 were successfully interviewed, yielding a response rate of 99%.

    Sampling error estimates

    The estimates from a sample survey are affected by two types of errors: nonsampling errors and sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2018 Nigeria Demographic and Health Survey (NDHS) to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2018 NDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability among all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

    Sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.

    If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2018 NDHS sample is the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulas. Sampling errors are computed in SAS, using programs developed by ICF. These programs use the Taylor linearisation method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

    Note: A more detailed description of estimates of sampling errors are presented in APPENDIX B of the survey report.

    Data appraisal

    Data Quality Tables - Household age distribution - Age distribution of eligible and interviewed women - Age distribution of eligible and interviewed men - Completeness of reporting - Births by calendar years - Reporting of age at death in days - Reporting of age at death in months - Standardisation exercise results from anthropometry training - Height and weight data completeness and quality for children - Height measurements from random subsample of measured children - Sibship size and sex ratio of siblings - Pregnancy-related mortality trends - Data collection period - Malaria prevalence according to rapid diagnostic test (RDT)

    Note: See detailed data quality tables in APPENDIX C of the report.

  18. ACS-ED 2014-2018 Children-Enrolled Public: Demographic Characteristics...

    • data-nces.opendata.arcgis.com
    • s.cnmilf.com
    • +1more
    Updated Sep 8, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Education Statistics (2020). ACS-ED 2014-2018 Children-Enrolled Public: Demographic Characteristics (CDP05) [Dataset]. https://data-nces.opendata.arcgis.com/datasets/nces::acs-ed-2014-2018-children-enrolled-public-demographic-characteristics-cdp05
    Explore at:
    Dataset updated
    Sep 8, 2020
    Dataset authored and provided by
    National Center for Education Statisticshttps://nces.ed.gov/
    License

    https://resources.data.gov/open-licenses/https://resources.data.gov/open-licenses/

    Area covered
    Description

    The American Community Survey Education Tabulation (ACS-ED) is a custom tabulation of the ACS produced for the National Center of Education Statistics (NCES) by the U.S. Census Bureau. The ACS-ED provides a rich collection of social, economic, demographic, and housing characteristics for school systems, school-age children, and the parents of school-age children. In addition to focusing on school-age children, the ACS-ED provides enrollment iterations for children enrolled in public school. The data profiles include percentages (along with associated margins of error) that allow for comparison of school district-level conditions across the U.S. For more information about the NCES ACS-ED collection, visit the NCES Education Demographic and Geographic Estimates (EDGE) program at: https://nces.ed.gov/programs/edge/Demographic/ACSAnnotation values are negative value representations of estimates and have values when non-integer information needs to be represented. See the table below for a list of common Estimate/Margin of Error (E/M) values and their corresponding Annotation (EA/MA) values.All information contained in this file is in the public domain. Data users are advised to review NCES program documentation and feature class metadata to understand the limitations and appropriate use of these data.

    -9

    An '-9' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small.

    -8

    An '-8' means that the estimate is not applicable or not available.

    -6

    A '-6' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.

    -5

    A '-5' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate.

    -3

    A '-3' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate.

    -2

    A '-2' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.

  19. d

    COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

    • catalog.data.gov
    • data.ct.gov
    • +2more
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-race-ethnicity
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    data.ct.gov
    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical

  20. D

    ARCHIVED: COVID-19 Cases by Population Characteristics Over Time

    • data.sfgov.org
    • healthdata.gov
    • +1more
    csv, xlsx, xml
    Updated Sep 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ARCHIVED: COVID-19 Cases by Population Characteristics Over Time [Dataset]. https://data.sfgov.org/Health-and-Social-Services/ARCHIVED-COVID-19-Cases-by-Population-Characterist/j7i3-u9ke
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Sep 11, 2023
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.

    B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases are from:  * Case interviews  * Laboratories  * Medical providers    These multiple streams of data are merged, deduplicated, and undergo data verification processes.  

    Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.

    Gender * The City collects information on gender identity using these guidelines.

    Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives.  * This dataset includes data for COVID-19 cases reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.

    Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to the California Department of Public Health, Virtual Assistant information gathering beginning December 2021. The Virtual Assistant is only sent to adults who are 18+ years old. https://www.sfdph.org/dph/files/PoliciesProcedures/COM9_SexualOrientationGuidelines.pdf">Learn more about our data collection guidelines pertaining to sexual orientation.

    Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.

    Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.

    Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.

    Transmission Type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.

    C. UPDATE PROCESS This dataset has been archived and will no longer update as of 9/11/2023.

    D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

    This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of cases on each date.

    New cases are the count of cases within that characteristic group where the positive tests were collected on that specific specimen collection date. Cumulative cases are the running total of all San Francisco cases in that characteristic group up to the specimen collection date listed.

    This data may not be immediately available for recently reported cases. Data updates as more information becomes available.

    To explore data on the total number of cases, use the ARCHIVED: COVID-19 Cases Over Time dataset.

    E. CHANGE LOG

    • 9/11/2023 - data on COVID-19 cases by population characteristics over time are no longer being updated. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
    • 6/6/2023 - data on cases by transmission type have been removed. See section ARCHIVED DATA for more detail.
    • 5/16/2023 - data on cases by sexual orientation, comorbidities, homelessness, and single room occupancy have been removed. See section ARCHIVED DATA for more detail.
    • 4/6/2023 - the State implemented system updates to improve the integrity of historical data.
    • 2/21/2023 - system updates to improve reliability and accuracy of cases data were implemented.
    • 1/31/2023 - updated “population_estimate” column to reflect the 2020 Census Bureau American Community Survey (ACS) San Francisco Population estimates.
    • 1/5/2023 - data on SNF cases removed. See section ARCHIVED DATA for more detail.
    • 3/23/2022 - ‘Native American’ changed to ‘American Indian or Alaska Native’ to align with the census.
    • 1/22/2022 - system updates to improve timeliness and accuracy of cases and deaths data were implemented.
    • 7/15/2022 - reinfections added to cases dataset. See section SUMMARY for more information on how reinfections are identified.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Koluit (2025). Human Resource Data Set (The Company) [Dataset]. https://www.kaggle.com/datasets/koluit/human-resource-data-set-the-company
Organization logo

Human Resource Data Set (The Company)

Dataset for People Analytics or general HR Systems Use

Explore at:
zip(401322 bytes)Available download formats
Dataset updated
Nov 12, 2025
Authors
Koluit
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Context

Similar to others who have created HR data sets, we felt that the lack of data out there for HR was limiting. It is very hard for someone to test new systems or learn People Analytics in the HR space. The only dataset most HR practitioners have is their real employee data and there are a lot of reasons why you would not want to use that when experimenting. We hope that by providing this dataset with an evergrowing variation of data points, others can learn and grow their HR data analytics and systems knowledge.

Some example test cases where someone might use this dataset:

HR Technology Testing and Mock-Ups Engagement survey tools HCM tools BI Tools Learning To Code For People Analytics Python/R/SQL HR Tech and People Analytics Educational Courses/Tools

Content

The core data CompanyData.txt has the basic demographic data about a worker. We treat this as the core data that you can join future data sets to.

Please read the Readme.md for additional information about this along with the Changelog for additional updates as they are made.

Acknowledgements

Initial names, addresses, and ages were generated using FakenameGenerator.com. All additional details including Job, compensation, and additional data sets were created by the Koluit team using random generation in Excel.

Inspiration

Our hope is this data is used in the HR or Research space to experiment and learn using HR data. Some examples that we hope this data will be used are listed above.

Contact Us

Have any suggestions for additions to the data? See any issues with our data? Want to use it for your project? Please reach out to us! https://koluit.com/ ryan@koluit.com

Search
Clear search
Close search
Google apps
Main menu