100+ datasets found

Numbers and percentages of participants missing data contributing to the...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gail E. Potter; Jimmy Wong; Jonathan Sugimoto; Aldiouma Diallo; John C. Victor; Kathleen Neuzil; M. Elizabeth Halloran (2023). Numbers and percentages of participants missing data contributing to the degree calculation. [Dataset]. http://doi.org/10.1371/journal.pone.0220443.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0220443.t002
Dataset updated
Jun 5, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Gail E. Potter; Jimmy Wong; Jonathan Sugimoto; Aldiouma Diallo; John C. Victor; Kathleen Neuzil; M. Elizabeth Halloran
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Numbers and percentages of participants missing data contributing to the degree calculation.
f
Data from: Evaluating Supplemental Samples in Longitudinal Research:...
tandf.figshare.com
txt
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura K. Taylor; Xin Tong; Scott E. Maxwell (2024). Evaluating Supplemental Samples in Longitudinal Research: Replacement and Refreshment Approaches [Dataset]. http://doi.org/10.6084/m9.figshare.12162072.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12162072.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Taylor & Francis
Authors
Laura K. Taylor; Xin Tong; Scott E. Maxwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.
d
A Correction for Structural Equation Modeling Fit Indices Under Missingness:...
search.dataone.org
dataverse.harvard.edu
+1more
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fitzgerald, Cailey E. (2023). A Correction for Structural Equation Modeling Fit Indices Under Missingness: Adapting the Root Mean Squared Error of Approximation to Conditions of Missing Data [Dataset]. http://doi.org/10.7910/DVN/28657
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/28657
Dataset updated
Nov 20, 2023
Dataset provided by
Harvard Dataverse
Authors
Fitzgerald, Cailey E.
Description
Missing data is a frequent occurrence in both small and large datasets. Among other things, missingness may be a result of coding or computer error, participant absences, or it may be intentional, as in a planned missing design. Whatever the cause, the problem of how to approach a dataset with holes is of much relevance in scientific research. First, missingness is approached as a theoretical construct, and its impacts on data analysis are encountered. I discuss missingness as it relates to structural equation modeling and model fit indices, specifically its interaction with the Root Mean Square Error of Approximation (RMSEA). Data simulation is used to show that RMSEA has a downward bias with missing data, yielding skewed fit indices. Two alternative formulas for RMSEA calculation are proposed: one correcting degrees of freedom and one using Kullback-Leibler divergence to result in an RMSEA calculation which is relatively independent of missingness. Simulations are conducted in Java, with results indicating that the Kullback-Leibler divergence provides a better correction for RMSEA calculation. Next, I approach missingness in an applied manner with an existing large dataset examining ideology measures. The researchers assessed ideology using a planned missingness design, resulting in high proportions of missing data. Factor analysis was performed to gauge uniqueness of ideology measures.
Young People Survey
kaggle.com
Updated Dec 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miroslav Sabo (2016). Young People Survey [Dataset]. https://www.kaggle.com/datasets/miroslavsabo/young-people-survey/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Miroslav Sabo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introduction

In 2013, students of the Statistics class at "https://fses.uniba.sk/en/">FSEV UK were asked to invite their friends to participate in this survey.

The data file (responses.csv) consists of 1010 rows and 150 columns (139 integer and 11 categorical).

For convenience, the original variable names were shortened in the data file. See the columns.csv file if you want to match the data with the original names.

The data contain missing values.

The survey was presented to participants in both electronic and written form.

The original questionnaire was in Slovak language and was later translated into English.

All participants were of Slovakian nationality, aged between 15-30.

The variables can be split into the following groups:

Music preferences (19 items)

Movie preferences (12 items)

Hobbies & interests (32 items)

Phobias (10 items)

Health habits (3 items)

Personality traits, views on life, & opinions (57 items)

Spending habits (7 items)

Demographics (10 items)

Research questions

Many different techniques can be used to answer many questions, e.g.

Clustering: Given the music preferences, do people make up any clusters of similar behavior?

Hypothesis testing: Do women fear certain phenomena significantly more than men? Do the left handed people have different interests than right handed?

Predictive modeling: Can we predict spending habits of a person from his/her interests and movie or music preferences?

Dimension reduction: Can we describe a large number of human interests by a smaller number of latent concepts?

Correlation analysis: Are there any connections between music and movie preferences?

Visualization: How to effectively visualize a lot of variables in order to gain some meaningful insights from the data?

(Multivariate) Outlier detection: Small number of participants often cheats and randomly answers the questions. Can you identify them? Hint: [Local outlier factor][1] may help.

Missing values analysis: Are there any patterns in missing responses? What is the optimal way of imputing the values in surveys?

Recommendations: If some of user's interests are known, can we predict the other? Or, if we know what a person listen, can we predict which kind of movies he/she might like?

Past research

(in slovak) Sleziak, P. - Sabo, M.: Gender differences in the prevalence of specific phobias. Forum Statisticum Slovacum. 2014, Vol. 10, No. 6. [Differences (gender + whether people lived in village/town) in the prevalence of phobias.]

Sabo, Miroslav. Multivariate Statistical Methods with Applications. Diss. Slovak University of Technology in Bratislava, 2014. [Clustering of variables (music preferences, movie preferences, phobias) + Clustering of people w.r.t. their interests.]

Questionnaire

MUSIC PREFERENCES

I enjoy listening to music.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I prefer.: Slow paced music 1-2-3-4-5 Fast paced music (integer)

Dance, Disco, Funk: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Folk music: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Country: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Classical: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Musicals: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Pop: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Rock: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Metal, Hard rock: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Punk: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Hip hop, Rap: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Reggae, Ska: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Swing, Jazz: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Rock n Roll: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Alternative music: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Latin: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Techno, Trance: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Opera: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

MOVIE PREFERENCES

I really enjoy watching movies.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

Horror movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Thriller movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Comedies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Romantic movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Sci-fi movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

War movies: Don't enjoy at all 1-2-3-4-5 E...
f
Datasheet2_Assessing disparities through missing race and ethnicity data:...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vora, Sheetal S.; Lytch, Ashley; Fair, Danielle; Harris, Julia G.; Wang, Xing; Hammelev, Erin; Klauss, Julia; Singleton, Jade; Machado, Ashley; Kreese, Connor; Morgan, Esi M.; Tarczy-Hornoch, Peter; Banschbach, Katelyn M.; Gilbert, Mileka; Pan, Nancy (2024). Datasheet2_Assessing disparities through missing race and ethnicity data: results from a juvenile arthritis registry.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001358805
Explore at:
Dataset updated
Jul 24, 2024
Authors
Vora, Sheetal S.; Lytch, Ashley; Fair, Danielle; Harris, Julia G.; Wang, Xing; Hammelev, Erin; Klauss, Julia; Singleton, Jade; Machado, Ashley; Kreese, Connor; Morgan, Esi M.; Tarczy-Hornoch, Peter; Banschbach, Katelyn M.; Gilbert, Mileka; Pan, Nancy
Description
IntroductionEnsuring high-quality race and ethnicity data within the electronic health record (EHR) and across linked systems, such as patient registries, is necessary to achieving the goal of inclusion of racial and ethnic minorities in scientific research and detecting disparities associated with race and ethnicity. The project goal was to improve race and ethnicity data completion within the Pediatric Rheumatology Care Outcomes Improvement Network and assess impact of improved data completion on conclusions drawn from the registry.MethodsThis is a mixed-methods quality improvement study that consisted of five parts, as follows: (1) Identifying baseline missing race and ethnicity data, (2) Surveying current collection and entry, (3) Completing data through audit and feedback cycles, (4) Assessing the impact on outcome measures, and (5) Conducting participant interviews and thematic analysis.ResultsAcross six participating centers, 29% of the patients were missing data on race and 31% were missing data on ethnicity. Of patients missing data, most patients were missing both race and ethnicity. Rates of missingness varied by data entry method (electronic vs. manual). Recovered data had a higher percentage of patients with Other race or Hispanic/Latino ethnicity compared with patients with non-missing race and ethnicity data at baseline. Black patients had a significantly higher odds ratio of having a clinical juvenile arthritis disease activity score (cJADAS10) of ≥5 at first follow-up compared with White patients. There was no significant change in odds ratio of cJADAS10 ≥5 for race and ethnicity after data completion. Patients missing race and ethnicity were more likely to be missing cJADAS values, which may affect the ability to detect changes in odds ratio of cJADAS ≥5 after completion.ConclusionsAbout one-third of the patients in a pediatric rheumatology registry were missing race and ethnicity data. After three audit and feedback cycles, centers decreased missing data by 94%, primarily via data recovery from the EHR. In this sample, completion of missing data did not change the findings related to differential outcomes by race. Recovered data were not uniformly distributed compared with those with non-missing race and ethnicity data at baseline, suggesting that differences in outcomes after completing race and ethnicity data may be seen with larger sample sizes.
A
‘Young People Survey’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Young People Survey’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-young-people-survey-04b9/01af2b48/?iid=033-554&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Young People Survey’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/miroslavsabo/young-people-survey on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Introduction

In 2013, students of the Statistics class at "https://fses.uniba.sk/en/">FSEV UK were asked to invite their friends to participate in this survey.

The data file (responses.csv) consists of 1010 rows and 150 columns (139 integer and 11 categorical).

For convenience, the original variable names were shortened in the data file. See the columns.csv file if you want to match the data with the original names.

The data contain missing values.

The survey was presented to participants in both electronic and written form.

The original questionnaire was in Slovak language and was later translated into English.

All participants were of Slovakian nationality, aged between 15-30.

The variables can be split into the following groups:

Music preferences (19 items)

Movie preferences (12 items)

Hobbies & interests (32 items)

Phobias (10 items)

Health habits (3 items)

Personality traits, views on life, & opinions (57 items)

Spending habits (7 items)

Demographics (10 items)

Research questions

Many different techniques can be used to answer many questions, e.g.

Clustering: Given the music preferences, do people make up any clusters of similar behavior?

Hypothesis testing: Do women fear certain phenomena significantly more than men? Do the left handed people have different interests than right handed?

Predictive modeling: Can we predict spending habits of a person from his/her interests and movie or music preferences?

Dimension reduction: Can we describe a large number of human interests by a smaller number of latent concepts?

Correlation analysis: Are there any connections between music and movie preferences?

Visualization: How to effectively visualize a lot of variables in order to gain some meaningful insights from the data?

(Multivariate) Outlier detection: Small number of participants often cheats and randomly answers the questions. Can you identify them? Hint: [Local outlier factor][1] may help.

Missing values analysis: Are there any patterns in missing responses? What is the optimal way of imputing the values in surveys?

Recommendations: If some of user's interests are known, can we predict the other? Or, if we know what a person listen, can we predict which kind of movies he/she might like?

Past research

(in slovak) Sleziak, P. - Sabo, M.: Gender differences in the prevalence of specific phobias. Forum Statisticum Slovacum. 2014, Vol. 10, No. 6. [Differences (gender + whether people lived in village/town) in the prevalence of phobias.]

Sabo, Miroslav. Multivariate Statistical Methods with Applications. Diss. Slovak University of Technology in Bratislava, 2014. [Clustering of variables (music preferences, movie preferences, phobias) + Clustering of people w.r.t. their interests.]

Questionnaire

MUSIC PREFERENCES

I enjoy listening to music.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I prefer.: Slow paced music 1-2-3-4-5 Fast paced music (integer)

Dance, Disco, Funk: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Folk music: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Country: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Classical: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Musicals: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Pop: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Rock: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Metal, Hard rock: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Punk: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Hip hop, Rap: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Reggae, Ska: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Swing, Jazz: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Rock n Roll: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Alternative music: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Latin: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Techno, Trance: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Opera: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

MOVIE PREFERENCES

I really enjoy watching movies.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

Horror movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Thriller movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Comedies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Romantic movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Sci-fi movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

War movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Tales: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Cartoons: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Documentaries: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Western movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

Action movies: Don't enjoy at all 1-2-3-4-5 Enjoy very much (integer)

HOBBIES & INTERESTS

History: Not interested 1-2-3-4-5 Very interested (integer)

Psychology: Not interested 1-2-3-4-5 Very interested (integer)

Politics: Not interested 1-2-3-4-5 Very interested (integer)

Mathematics: Not interested 1-2-3-4-5 Very interested (integer)

Physics: Not interested 1-2-3-4-5 Very interested (integer)

Internet: Not interested 1-2-3-4-5 Very interested (integer)

PC Software, Hardware: Not interested 1-2-3-4-5 Very interested (integer)

Economy, Management: Not interested 1-2-3-4-5 Very interested (integer)

Biology: Not interested 1-2-3-4-5 Very interested (integer)

Chemistry: Not interested 1-2-3-4-5 Very interested (integer)

Poetry reading: Not interested 1-2-3-4-5 Very interested (integer)

Geography: Not interested 1-2-3-4-5 Very interested (integer)

Foreign languages: Not interested 1-2-3-4-5 Very interested (integer)

Medicine: Not interested 1-2-3-4-5 Very interested (integer)

Law: Not interested 1-2-3-4-5 Very interested (integer)

Cars: Not interested 1-2-3-4-5 Very interested (integer)

Art: Not interested 1-2-3-4-5 Very interested (integer)

Religion: Not interested 1-2-3-4-5 Very interested (integer)

Outdoor activities: Not interested 1-2-3-4-5 Very interested (integer)

Dancing: Not interested 1-2-3-4-5 Very interested (integer)

Playing musical instruments: Not interested 1-2-3-4-5 Very interested (integer)

Poetry writing: Not interested 1-2-3-4-5 Very interested (integer)

Sport and leisure activities: Not interested 1-2-3-4-5 Very interested (integer)

Sport at competitive level: Not interested 1-2-3-4-5 Very interested (integer)

Gardening: Not interested 1-2-3-4-5 Very interested (integer)

Celebrity lifestyle: Not interested 1-2-3-4-5 Very interested (integer)

Shopping: Not interested 1-2-3-4-5 Very interested (integer)

Science and technology: Not interested 1-2-3-4-5 Very interested (integer)

Theatre: Not interested 1-2-3-4-5 Very interested (integer)

Socializing: Not interested 1-2-3-4-5 Very interested (integer)

Adrenaline sports: Not interested 1-2-3-4-5 Very interested (integer)

Pets: Not interested 1-2-3-4-5 Very interested (integer)

PHOBIAS

Flying: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Thunder, lightning: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Darkness: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Heights: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Spiders: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Snakes: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Rats, mice: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Ageing: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Dangerous dogs: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Public speaking: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

HEALTH HABITS

Smoking habits: Never smoked - Tried smoking - Former smoker - Current smoker (categorical)

Drinking: Never - Social drinker - Drink a lot (categorical)

I live a very healthy lifestyle.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS

I take notice of what goes on around me.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I try to do tasks as soon as possible and not leave them until last minute.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I always make a list so I don't forget anything.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I often study or work even in my spare time.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I look at things from all different angles before I go ahead.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I believe that bad people will suffer one day and good people will be rewarded.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I am reliable at work and always complete all tasks given to me.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

I always keep my promises.: Strongly disagree 1-2-3-4-5 Strongly agree (integer)

**I can fall for someone very quickly and then
Retail Product Dataset with Missing Values
kaggle.com
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himel Sarder
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021
beta.ukdataservice.ac.uk
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute For Social University Of Essex; University Of Manchester, Cathie Marsh Institute For Social Research (CMIST) (2022). Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021 [Dataset]. http://doi.org/10.5255/ukda-sn-9019-1
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-9019-1
Dataset updated
2022
Dataset provided by
DataCitehttps://www.datacite.org/
University of Essex, Institute for Social and Economic Research
Authors
Institute For Social University Of Essex; University Of Manchester, Cathie Marsh Institute For Social Research (CMIST)
Description
As the UK went into the first lockdown of the COVID-19 pandemic, the team behind the biggest social survey in the UK, Understanding Society (UKHLS), developed a way to capture these experiences. From April 2020, participants from this Study were asked to take part in the Understanding Society COVID-19 survey, henceforth referred to as the COVID-19 survey or the COVID-19 study.
The COVID-19 survey regularly asked people about their situation and experiences. The resulting data gives a unique insight into the impact of the pandemic on individuals, families, and communities. The COVID-19 Teaching Dataset contains data from the main COVID-19 survey in a simplified form. It covers topics such as

Socio-demographics

Whether working at home and home-schooling

COVID symptoms

Health and well-being

Social contact and neighbourhood cohesion

Volunteering

The resource contains two data files:

Cross-sectional: contains data collected in Wave 4 in July 2020 (with some additional variables from other waves);

Longitudinal: Contains mainly data from Waves 1, 4 and 9 with key variables measured at three time points.

Key features of the dataset

Missing values: in the web survey, participants clicking "Next" but not answering a question were given further options such as "Don't know" and "Prefer not to say". Missing observations like these are recorded using negative values such as -1 for "Don't know". In many instances, users of the data will need to set these values as missing. The User Guide includes Stata and SPSS code for setting negative missing values to system missing.

The Longitudinal file is a balanced panel and is in wide format. A balanced panel means it only includes participants that took part in every wave. In wide format, each participant has one row of information, and each measurement of the same variable is a different variable.

Weights: both the cross-sectional and longitudinal files include survey weights that adjust the sample to represent the UK adult population. The cross-sectional weight (betaindin_xw) adjusts for unequal selection probabilities in the sample design and for non-response. The longitudinal weight (ci_betaindin_lw) adjusts for the sample design and also for the fact that not all those invited to participate in the survey, do participate in all waves.

Both the cross-sectional and longitudinal datasets include the survey design variables (psu and strata).

A full list of variables in both files can be found in the User Guide appendix.
Who is in the sample?
All adults (16 years old and over as of April 2020), in households who had participated in at least one of the last two waves of the main study Understanding Society, were invited to participate in this survey. From the September 2020 (Wave 5) survey onwards, only sample members who had completed at least one partial interview in any of the first four web surveys were invited to participate. From the November 2020 (Wave 6) survey onwards, those who had only completed the initial survey in April 2020 and none since, were no longer invited to participate

The User guide accompanying the data adds to the information here and includes a full variable list with details of measurement levels and links to the relevant questionnaire.
Number of missing persons files U.S. 2024, by race
statista.com
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of missing persons files U.S. 2024, by race [Dataset]. https://www.statista.com/statistics/240396/number-of-missing-persons-files-in-the-us-by-race/
Explore at:
Dataset updated
Aug 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
United States
Description
In 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
D
Data from: Using decision trees to understand structure in missing data
datasetcatalog.nlm.nih.gov
search.dataone.org
+2more
Updated Jun 2, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mengersen, Kerrie L.; Tierney, Nicholas J.; Harden, Fiona A.; Harden, Maurice J. (2015). Using decision trees to understand structure in missing data [Dataset]. http://doi.org/10.5061/dryad.j4f19
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.j4f19
Dataset updated
Jun 2, 2015
Authors
Mengersen, Kerrie L.; Tierney, Nicholas J.; Harden, Fiona A.; Harden, Maurice J.
Description
Objectives: Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting: Data taken from employees at 3 different industrial sites in Australia. Participants: 7915 observations were included. Materials and methods: The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results: CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion: Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions: Researchers are encouraged to use CART and BRT models to explore and understand missing data.
a
Levels of obesity, inactivity and associated illnesses (England): Missing...
hub.arcgis.com
data.catchmentbasedapproach.org
+1more
Updated Apr 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Rivers Trust (2021). Levels of obesity, inactivity and associated illnesses (England): Missing data [Dataset]. https://hub.arcgis.com/datasets/theriverstrust::levels-of-obesity-inactivity-and-associated-illnesses-england-missing-data/explore
Explore at:
Dataset updated
Apr 8, 2021
Dataset authored and provided by
The Rivers Trust
Area covered

Description
SUMMARYTo be viewed in combination with the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.This dataset shows where there was no data* relating to one of more of the following factors:Obesity/inactivity-related illnesses (recorded at the GP practice catchment area level*)Adult obesity (recorded at the GP practice catchment area level*)Inactivity in children (recorded at the district level)Excess weight in children (recorded at the Middle Layer Super Output Area level)* GPs do not have catchments that are mutually exclusive from each other: they overlap, with some geographic areas being covered by 30+ practices.GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. This dataset identifies areas where data from 2019/20 was used, where one or more GPs did not submit data in either year (this could be because there are rural areas that aren’t officially covered by any GP practices), or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution.Results of the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ analysis in these areas should be interpreted with caution, particularly if the levels of obesity, inactivity and associated illnesses appear to be significantly lower than in their immediate surrounding areas.Really small areas with ‘missing’ data were deleted, where it was deemed that missing data will not have impacted the overall analysis (i.e. where GP data was missing from really small countryside areas where no people live).See also Health and wellbeing statistics (GP-level, England): Missing data and potential outliers dataDATA SOURCESThis dataset was produced using:- Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.- National Child Measurement Programme: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. - Active Lives Survey 2019: Sport and Physical Activity Levels amongst children and young people in school years 1-11 (aged 5-16). © Sport England 2020.- Active Lives Survey 2019: Sport and Physical Activity Levels amongst adults aged 16+. © Sport England 2020.- GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.- Administrative boundaries: Boundary-LineTM: Contains Ordnance Survey data © Crown copyright and database right 2021. Contains public sector information licensed under the Open Government Licence v3.0.- MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Sport England 2020; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains Ordnance Survey data © Crown copyright and database right 2021. Contains public sector information licensed under the Open Government Licence v3.0.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
S
Experimental Dataset on the Impact of Unfair Behavior by AI and Humans on...
scidb.cn
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Luo (2025). Experimental Dataset on the Impact of Unfair Behavior by AI and Humans on Trust: Evidence from Six Experimental Studies [Dataset]. http://doi.org/10.57760/sciencedb.psych.00565
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.psych.00565
Dataset updated
Apr 30, 2025
Dataset provided by
Science Data Bank
Authors
Yang Luo
Description
This dataset originates from a series of experimental studies titled “Tough on People, Tolerant to AI? Differential Effects of Human vs. AI Unfairness on Trust” The project investigates how individuals respond to unfair behavior (distributive, procedural, and interactional unfairness) enacted by artificial intelligence versus human agents, and how such behavior affects cognitive and affective trust.1 Experiment 1a: The Impact of AI vs. Human Distributive Unfairness on TrustOverview: This dataset comes from an experimental study aimed at examining how individuals respond in terms of cognitive and affective trust when distributive unfairness is enacted by either an artificial intelligence (AI) agent or a human decision-maker. Experiment 1a specifically focuses on the main effect of the “type of decision-maker” on trust.Data Generation and Processing: The data were collected through Credamo, an online survey platform. Initially, 98 responses were gathered from students at a university in China. Additional student participants were recruited via Credamo to supplement the sample. Attention check items were embedded in the questionnaire, and participants who failed were automatically excluded in real-time. Data collection continued until 202 valid responses were obtained. SPSS software was used for data cleaning and analysis.Data Structure and Format: The data file is named “Experiment1a.sav” and is in SPSS format. It contains 28 columns and 202 rows, where each row corresponds to one participant. Columns represent measured variables, including: grouping and randomization variables, one manipulation check item, four items measuring distributive fairness perception, six items on cognitive trust, five items on affective trust, three items for honesty checks, and four demographic variables (gender, age, education, and grade level). The final three columns contain computed means for distributive fairness, cognitive trust, and affective trust.Additional Information: No missing data are present. All variable names are labeled in English abbreviations to facilitate further analysis. The dataset can be directly opened in SPSS or exported to other formats.2 Experiment 1b: The Mediating Role of Perceived Ability and Benevolence (Distributive Unfairness)Overview: This dataset originates from an experimental study designed to replicate the findings of Experiment 1a and further examine the potential mediating role of perceived ability and perceived benevolence.Data Generation and Processing: Participants were recruited via the Credamo online platform. Attention check items were embedded in the survey to ensure data quality. Data were collected using a rolling recruitment method, with invalid responses removed in real time. A total of 228 valid responses were obtained.Data Structure and Format: The dataset is stored in a file named Experiment1b.sav in SPSS format and can be directly opened in SPSS software. It consists of 228 rows and 40 columns. Each row represents one participant’s data record, and each column corresponds to a different measured variable. Specifically, the dataset includes: random assignment and grouping variables; one manipulation check item; four items measuring perceived distributive fairness; six items on perceived ability; five items on perceived benevolence; six items on cognitive trust; five items on affective trust; three items for attention check; and three demographic variables (gender, age, and education). The last five columns contain the computed mean scores for perceived distributive fairness, ability, benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be analyzed directly in SPSS or exported to other formats as needed.3 Experiment 2a: Differential Effects of AI vs. Human Procedural Unfairness on TrustOverview: This dataset originates from an experimental study aimed at examining whether individuals respond differently in terms of cognitive and affective trust when procedural unfairness is enacted by artificial intelligence versus human decision-makers. Experiment 2a focuses on the main effect of the decision agent on trust outcomes.Data Generation and Processing: Participants were recruited via the Credamo online survey platform from two universities located in different regions of China. A total of 227 responses were collected. After excluding those who failed the attention check items, 204 valid responses were retained for analysis. Data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2a.sav in SPSS format and can be directly opened in SPSS software. It contains 204 rows and 30 columns. Each row represents one participant’s response record, while each column corresponds to a specific variable. Variables include: random assignment and grouping; one manipulation check item; seven items measuring perceived procedural fairness; six items on cognitive trust; five items on affective trust; three attention check items; and three demographic variables (gender, age, and education). The final three columns contain computed average scores for procedural fairness, cognitive trust, and affective trust.Additional Notes: The dataset contains no missing values. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be directly analyzed in SPSS or exported to other formats as needed.4 Experiment 2b: Mediating Role of Perceived Ability and Benevolence (Procedural Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 2a and to further examine the potential mediating roles of perceived ability and perceived benevolence in shaping trust responses under procedural unfairness.Data Generation and Processing: Participants were working adults recruited through the Credamo online platform. A rolling data collection strategy was used, where responses failing attention checks were excluded in real time. The final dataset includes 235 valid responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2b.sav, which is in SPSS format and can be directly opened using SPSS software. It contains 235 rows and 43 columns. Each row corresponds to a single participant, and each column represents a specific measured variable. These include: random assignment and group labels; one manipulation check item; seven items measuring procedural fairness; six items for perceived ability; five items for perceived benevolence; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final five columns contain the computed average scores for procedural fairness, perceived ability, perceived benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to support future reuse and secondary analysis. The dataset can be directly analyzed in SPSS and easily converted into other formats if needed.5 Experiment 3a: Effects of AI vs. Human Interactional Unfairness on TrustOverview: This dataset comes from an experimental study that investigates how interactional unfairness, when enacted by either artificial intelligence or human decision-makers, influences individuals’ cognitive and affective trust. Experiment 3a focuses on the main effect of the “decision-maker type” under interactional unfairness conditions.Data Generation and Processing: Participants were college students recruited from two universities in different regions of China through the Credamo survey platform. After excluding responses that failed attention checks, a total of 203 valid cases were retained from an initial pool of 223 responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3a.sav, in SPSS format and compatible with SPSS software. It contains 203 rows and 27 columns. Each row represents a single participant, while each column corresponds to a specific measured variable. These include: random assignment and condition labels; one manipulation check item; four items measuring interactional fairness perception; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final three columns contain computed average scores for interactional fairness, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variable names are provided using standardized English abbreviations to facilitate secondary analysis. The data can be directly analyzed using SPSS and exported to other formats as needed.6 Experiment 3b: The Mediating Role of Perceived Ability and Benevolence (Interactional Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 3a and further examine the potential mediating roles of perceived ability and perceived benevolence under conditions of interactional unfairness.Data Generation and Processing: Participants were working adults recruited via the Credamo platform. Attention check questions were embedded in the survey, and responses that failed these checks were excluded in real time. Data collection proceeded in a rolling manner until a total of 227 valid responses were obtained. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3b.sav, in SPSS format and compatible with SPSS software. It includes 227 rows and
f
Data From: Multiple imputation for harmonizing longitudinal non-commensurate...
wiley.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Juned Siddique; Dr. Jerome Reiter; Dr. Ahnalee Brincks; Dr. Robert D. Gibbons; Prof. Catherine M. Crespi; Prof. C. Hendricks Brown (2023). Data From: Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.1466878.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1466878.v2
Dataset updated
May 31, 2023
Dataset provided by
Wiley
Authors
Dr. Juned Siddique; Dr. Jerome Reiter; Dr. Ahnalee Brincks; Dr. Robert D. Gibbons; Prof. Catherine M. Crespi; Prof. C. Hendricks Brown
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta-analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials, and use multiple imputation to fill in missing measurements. We apply our method to 5 longitudinal adolescent depression trials where 4 studies used one depression measure and the fifth study used a different depression measure. None of the 5 studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigating whether external information is appropriately incorporated into the imputed values.
f
Table_1_Comparison of machine learning and logistic regression as predictive...
frontiersin.figshare.com
xlsx
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongying Zheng; Xinyu Hao; Muhanmmad Khan; Lixia Wang; Fan Li; Ning Xiang; Fuli Kang; Timo Hamalainen; Fengyu Cong; Kedong Song; Chong Qiao (2023). Table_1_Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study.XLSX [Dataset]. http://doi.org/10.3389/fcvm.2022.959649.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fcvm.2022.959649.s003
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
Dongying Zheng; Xinyu Hao; Muhanmmad Khan; Lixia Wang; Fan Li; Ning Xiang; Fuli Kang; Timo Hamalainen; Fengyu Cong; Kedong Song; Chong Qiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionPreeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning algorithms demonstrate promising potential, while there is a controversial discussion about whether machine learning methods should be recommended preferably, compared to traditional statistical models.MethodsWe employed both logistic regression and six machine learning methods as binary predictive models for a dataset containing 733 women diagnosed with preeclampsia. Participants were grouped by four different pregnancy outcomes. After the imputation of missing values, statistical description and comparison were conducted preliminarily to explore the characteristics of documented 73 variables. Sequentially, correlation analysis and feature selection were performed as preprocessing steps to filter contributing variables for developing models. The models were evaluated by multiple criteria.ResultsWe first figured out that the influential variables screened by preprocessing steps did not overlap with those determined by statistical differences. Secondly, the most accurate imputation method is K-Nearest Neighbor, and the imputation process did not affect the performance of the developed models much. Finally, the performance of models was investigated. The random forest classifier, multi-layer perceptron, and support vector machine demonstrated better discriminative power for prediction evaluated by the area under the receiver operating characteristic curve, while the decision tree classifier, random forest, and logistic regression yielded better calibration ability verified, as by the calibration curve.ConclusionMachine learning algorithms can accomplish prediction modeling and demonstrate superior discrimination, while Logistic Regression can be calibrated well. Statistical analysis and machine learning are two scientific domains sharing similar themes. The predictive abilities of such developed models vary according to the characteristics of datasets, which still need larger sample sizes and more influential predictors to accumulate evidence.
NielsenHackathon
kaggle.com
Updated Jan 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aadarsh Singh (2021). NielsenHackathon [Dataset]. https://www.kaggle.com/datasets/paradoxlover/nielsenhackathon
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 1, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aadarsh Singh
Description
Context

Create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.

Task:

Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.

Empathy dataset

zenodo.org
data.niaid.nih.gov

bin, csv, html

Updated Dec 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Zenodo (2024). Empathy dataset [Dataset]. http://doi.org/10.5281/zenodo.7683907

Explore at:

bin, html, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7683907

Dataset updated

Dec 18, 2024

Dataset provided by

Zenodohttp://zenodo.org/

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.

The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.

Size: A dataset of size 1973*28

Number of features: 28

Ground truth: No

Type of Graph: Mixed graph

The following gives the description of the variables:

Feature	FeatureLabel	Domain	Item meaning from Davis 1980
001	1FS	Green	I daydream and fantasize, with some regularity, about things that might happen to me.
002	2EC	Purple	I often have tender, concerned feelings for people less fortunate than me.
003	3PT_R	Yellow	I sometimes find it difficult to see things from the “other guy’s” point of view.
004	4EC_R	Purple	Sometimes I don’t feel very sorry for other people when they are having problems.
005	5FS	Green	I really get involved with the feelings of the characters in a novel.
006	6PD	Red	In emergency situations, I feel apprehensive and ill-at-ease.
007	7FS_R	Green	I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed)
008	8PT	Yellow	I try to look at everybody’s side of a disagreement before I make a decision.
009	9EC	Purple	When I see someone being taken advantage of, I feel kind of protective towards them.
010	10PD	Red	I sometimes feel helpless when I am in the middle of a very emotional situation.
011	11PT	Yellow	sometimes try to understand my friends better by imagining how things look from their perspective
012	12FS_R	Green	Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed)
013	13PD_R	Red	When I see someone get hurt, I tend to remain calm. (Reversed)
014	14EC_R	Purple	Other people’s misfortunes do not usually disturb me a great deal. (Reversed)
015	15PT_R	Yellow	If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed)
016	16FS	Green	After seeing a play or movie, I have felt as though I were one of the characters.
017	17PD	Red	Being in a tense emotional situation scares me.
018	18EC_R	Purple	When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed)
019	19PD_R	Red	I am usually pretty effective in dealing with emergencies. (Reversed)
020	20FS	Green	I am often quite touched by things that I see happen.
021	21PT	Yellow	I believe that there are two sides to every question and try to look at them both.
022	22EC	Purple	I would describe myself as a pretty soft-hearted person.
023	23FS	Green	When I watch a good movie, I can very easily put myself in the place of a leading character.
024	24PD	Red	I tend to lose control during emergencies.
025	25PT	Yellow	When I’m upset at someone, I usually try to “put myself in his shoes” for a while.
026	26FS	Green	When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me.
027	27PD	Red	When I see someone who badly needs help in an emergency, I go to pieces.
028	28PT	Yellow	Before criticizing somebody, I try to imagine how I would feel if I were in their place

More information about the dataset is contained in empathy_description.html file.

Data from: Triple Dissociation Revisited
openneuro.org
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie Van; Samuel Nielson; C. Brock Kirwan (2022). Triple Dissociation Revisited [Dataset]. http://doi.org/10.18112/openneuro.ds004086.v1.2.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds004086.v1.2.0
Dataset updated
May 31, 2022
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Julie Van; Samuel Nielson; C. Brock Kirwan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
README

DETAILS FOR ACCESSING DATA

CONTACT PERSON (Corresponding Author)

C. Brock Kirwan 1001 KMBL, Brigham Young University, Provo, UT 84602 Email: kirwan@byu.edu Phone: 801-422-2532 Fax: 801-422-0602 ORCID ID: 0000-003-0768-1446

OVERVIEW

PROJECT NAME

Limited Evidence for a Triple Dissociation in the Medial Temporal Lobe: an fMRI Recognition Memory Replication Study

YEARS THAT PROJECT RAN

2020-2021

BRIEF OVERVIEW OF EXPERIMENTAL TASKS

The present experiment aims to replicate two previous papers (cited below) in which authors present two analysis paths for a dataset in which participants underwent fMRI while performing a recognition memory test for old and new words. Both studies found activation in the hippocampus, with the first (Daselaar, Fleck, & Cabeza, 2006) demonstrating a distinction in hippocampus activation corresponding to true and perceived oldness of stimuli and the second (Daselaar, Fleck, Prince, & Cabeza, 2006) demonstrating that hippocampus activation reflects the subjective experience of the participant.

We replicated behavioral and MRI acquisition parameters reported in these two target articles with N=53 participants and focused fMRI analyses on regions of interest reported in those articles looking at fMRI activation for differences corresponding with true and perceived oldness and those associated with subjective memory experiences of recollection, familiarity, and novelty.

References: (1) Daselaar, S. M., Fleck, M. S., & Cabeza, R. (2006). Triple dissociation in the medial temporal lobes: Recollection, familiarity, and novelty. J Neurophysiol, 96(4), 1902–1911. https://doi.org/10.1152/jn.01029.2005 (2) Daselaar, S. M., Fleck, M. S., Prince, S. E., & Cabeza, R. (2006). The medial temporal lobe distinguishes old from new independently of consciousness. J Neurosci, 26(21), 5835–5839. https://doi.org/26/21/5835 [pii] 10.1523/JNEUROSCI.0258-06.2006

DATASET CONTENTS

This dataset includes raw data from all scanned participants acquired by the Siemens Trio 3T MRI scanner (12-channel head coil), with each participant consisting of the following folders: /anat, /fmap, and /func. /anat includes structural imaging data obtained from scanning in the form of .nii.gz and .json files. /fmap includes field mapping data in the form of .nii.gz and .json files. /func includes functional imaging data obtained from scanning in the form of .nii.gz and .json files, along with event.tsv files for each run (total runs = 4). Data for a total of N=53 participants is included in the present dataset.

INDEPENDENT VARIABLES

True vs Perceived Oldness: Mean activity (mean parameter estimates) for each individual trial in the anterior/posterior MTL regions were identified by true oldness and perceived novelty contrasts. These resulting values were entered into a logistic regression model with activations in the MTL regions set as independent variables. Subjective Confidence: Mean activity for each individual trial from different MTL regions were identified and entered into a multiple regression model based on activations in different MTL regions (i.e., recollection-related activity, familiarity-related activity, and novelty-related activity) as independent variables.

DEPENDENT VARIABLES

True vs Perceived Oldness: A binary variable reflecting whether participants correctly recognized an old item as old (hit) or incorrectly classified an old item as new (miss) were set as the dependent variable. Subjective Confidence: 6-point oldness scale was entered as the dependent variable.

CONTROL VARIABLES

N/A

QUALITY ASSESSMENT OF DATA

Data were preprocessed, which included spatial motion correction and spatial normalization that was automatically generated by the fMRIPrep software. Following fMRIPrep preprocessing, functional data were scaled with a mean of 100 and blurred with an 8 mm FWHM Gaussian kernel to account for inter-subject anatomical variation. Analysis scripts are available here: https://osf.io/ctvsw/. Data was acquired for N=60 participants, with data from n=7 participants excluded for reasons of ineligibility (left-handedness, n=1), failure to comply with study procedures (n=2), excessive motion (n=3), and equipment error (n=1).

METHODS

STUDY PHASE

In our experimental task, participants completed a study phase in which they were presented with a randomized list of 120 real English words and 80 pseudo words at a rate of 2000 ms per item. A fixation cross was presented between words for a random time interval varying between 0-5500 ms, where participants indicated whether the stimulus presented was a word or pseudo word. They were not informed at this time that their memory for the words would be tested. After the completion of the study phase, researchers situated participants in the MRI scanner and obtained localizer, field map, and T1-weighted structural MRI scans before initiating the test phase of the experiment.

TEST PHASE

During the test phase, a task paradigm was presented as four experimental runs lasting between 435-442 seconds. Participants saw an equal number of target stimuli (words shown during the study phase) and foil stimuli (novel words) at 60 words per run. Target and foil stimuli were presented in a randomized order at 3.4 seconds. Participants were asked to make judgments on whether the word was presented on the study list while the stimulus was displayed. Confidence ratings were collected for those judgments between true and perceived oldness of stimuli from 1 (lowest confidence) to 4 (highest confidence), with a prompt displayed for 1.7 seconds.

PARTICIPANTS

Recruitment: To determine sample size, an a prior power analysis was done by extracting values from Figure 1 of (Daselaar, Fleck, Prince, et al., 2006) in the right hippocampus via Web Plot Digitizer, given that the region showed smaller differences. We computed main effects by averaging hits and misses, and CRs and FAs prior to SEM to SD conversion and averaging again. Resulting values were entered into g+power to estimate an effect size of 0.46, indicating that a sample of N=54 would achieve a power of 0.95 with an error probability of 0.05 (t(1,53)=1.67). Participants were recruited from the campus community and met MRI compliance screening requirements. Exclusion: Non-native English speakers, history of drug use, previous psychiatric or neurologic diagnosis, or contra-indications for MRI (e.g., ferromagnetic implant). Compensation: Participants were compensated for participation with a choice of $20, course credit, or a 3D-printed 1/4-scale model of their brains.

APPARATUS

Localizer, field map, and T1-weighted structural MRI scans were obtained once the participants were situated in the scanner. MRI data were collected using a Siemens Trio 3T MRI scanner (12-channel head coil) and behavioral responses were collected using a four-key fiber-optic response cylinder (Current Designs, n.d.). Structural scanning was done at the beginning of the scan session (256 x 215 matrix, TR 1900 ms, TE 2.26 ms, FOV 250 x 218 mm, 176 slices, 1 mm slice thickness, 0 mm spacing) and functional scanning was done during all experimental runs (64 x 64 image matrix, TR 1800 ms, TE 31 ms, FOV 240 mm, 34 slices, 3.8 mm slice thickness). An MR-compatible LCD monitor displayed stimuli from the head of the bore, which participants viewed through a mirror mounted on the head coil. MRI data are available at: https://openneuro.org/datasets/ds004086.

INITIAL SETUP

[See above under STUDY PHASE and TEST PHASE for procedures performed once the participant arrived.]

TASK ORGANIZATION

Behavioral and imaging data were collected for each participant through the course of four (4) experimental runs. Behavioral data was used to create event.tsv files for each participant per run, indicating the onset, duration, trial type, stimuli response, correct answer, and reaction times of responses. Each experimental run lasted between 435-442 seconds, where participants saw an equal number of target stimuli (words shown in the study phase) and foil stimuli (novel words) at 60 words per run.

TASK DETAILS

Stimuli were presented for 3.4 seconds, where participants were asked to make judgments indicating whether the word was presented on the study list while the stimulus was displayed. Confidence ratings were then collected for those judgments between true and perceived oldness of stimuli from 1 (lowest confidence) to 4 (highest confidence). Prompt for the confidence ratings was displayed for 1.7 seconds, with each trial separated by an inter-trial interval (ITI) consisting of a fixation cross with a randomly distributed duration of 0-5.4 seconds (mean ITI=2.7 seconds).

ADDITIONAL DATA ACQUIRED

Behavioral data were identified as hits, misses, correct rejections (CRs), and false alarms (FAs). Hits indicated correct judgments of “old” for words that were actually old. Misses reflected incorrect judgments of “old” for words that were actually new. Correct rejections indicated correct judgments of “new” for new words, and false alarms represented incorrect judgments of “new” for old words.

EXPERIMENTAL LOCATION

The study was performed in the MRI Research Facility at the Brigham Young University campus in Provo, UT.

MISSING DATA

The following subjects may be missing data and/or are not included in analyses for the following reasons: Sub-001: Ineligible; left-handedness Sub-005: Failure to comply; completed only 10% of entries compared to other subjects Sub-026: Excessive motion Sub-034: Failure to comply; did not provide a response other than a “1” or none Sub-050: Excessive motion Sub-052: Excessive motion Sub-056: Equipment error

NOTES

Sub-054 restarted their testing and completed the study protocol in full in the latter session.
The Science of BDSM Data, Phoenix, Arizona, 2014 - Version 1
search.gesis.org
Updated Aug 28, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inter-University Consortium for Political and Social Research (2019). The Science of BDSM Data, Phoenix, Arizona, 2014 - Version 1 [Dataset]. http://doi.org/10.3886/ICPSR37395.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR37395.v1
Dataset updated
Aug 28, 2019
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
GESIS search
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de688050https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de688050
Area covered
Phoenix, Arizona
Description
Abstract (en): The goals of this study were to test whether participants who engaged in an extreme ritual in a naturalistic setting would evidence signs of altered states of consciousness, to examine other physiological and affective effects of the ritual, and to determine whether these effects varied based on the role the individual performed within the ritual. A multi-method approach was used that utilized various psychological self-report measures, a measure of cognitive functioning, and a measure of physiological stress. The data collection took place at the "Dance of Souls," a ritual conducted on the last day of the annual Southwest Leather Conference in Phoenix, Arizona, in which participants received temporary piercings with hooks or weights attached to the piercings and danced to music provided by drummers. The associated publication, Altered States of Consciousness during an Extreme Ritual, was used to accompany the data in this collection. Users are encouraged to consult the publication for additional information. The data collection includes one de-identified dataset with 164 variables for 83 cases. Demographic variables include sex, gender, pierced vs. non-pierced, and the role the participant played in the ceremony. The goals of this study were to test whether participants engaged in an extreme ritual in a naturalistic setting would evidence signs of altered states of consciousness, to examine other physiological and affective effects of the ritual, and to determine whether these effects varied based on the role the individual performed within the ritual. Data collection took place at the 2014 "Dance of Souls," a ritual conducted on the last day of the annual Southwest Leather Conference in Phoenix, Arizona. A mixed-methods approach was utilized where participants completed repeated measures of positive and negative affect, salivary cortisol (a hormone associated with stress), self-reported stress, sexual arousal, and intimacy; Stroop test scores were also collected. Conference attendees could enroll in the study at any point until an hour prior to the beginning of the dance. Measures were taken before the dance (baseline), during the dance, and after the dance. Not all participants completed the materials in full during data collection and many were missing at least some data. To rectify this, three months after the conference, conference organizers sent an email to all the dance attendees with a link to an online version of the surveys. The goals were to (a) collect additional information from existing participants, (b) allow existing participants to complete any missing surveys, and (c) allow new participants to fill out the pre and post-dance surveys. If existing participants filled out a duplicate version of the pre- or post-dance survey, their responses were averaged in the dataset. It was not possible to collect Stroop and saliva samples in this manner. Variables in the data collection include:

Demographic: gender/sex, role in ritual, pierced/non-pierced; Participant experience/skill with BDSM; Inclusion of Other in the Self (IOS) scale variables; Positive and Negative Affect Schedule (PANAS) scale variables; Self-reported measures related to the ritual; Flow State Scale (FSS) variables; Psychological and Physiological measure before, during, and after the ritual; ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.; Checked for undocumented or out-of-range codes.. Presence of Common Scales: The Flow State Scale (FSS); Positive and Negative Affect Schedule (PANAS); Inclusion of Other in Self (IOS) Scale; Stroop Effect tests. Datasets:DS1: The Science of BDSM Data, Phoenix, Arizona, 2014 Participants of the Dance of Souls ritual, on the final day of the 2014 Southwest Leather Conference (SWLC) in Phoenix, Arizona. Smallest Geographic Unit: None The Dance of Souls took place in a large ballroom on the final day of the 2014 Southwest Leather Conference (SWLC), an annual four-day conference in Phoenix, Arizona. Approximately 180 people participated in the Dance of Souls. Of these, 83 enrolled in the ...
Z
Base rates of food safety practices in European households: Summary data...
data.niaid.nih.gov
zenodo.org
+1more
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scholderer, Joachim (2022). Base rates of food safety practices in European households: Summary data from the SafeConsume Household Survey [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7264924
Explore at:
Dataset updated
Nov 4, 2022
Dataset authored and provided by
Scholderer, Joachim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains estimates of the base rates of 550 food safety-relevant food handling practices in European households. The data are representative for the population of private households in the ten European countries in which the SafeConsume Household Survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK).

Sampling design

In each of the ten EU and EEA countries where the survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK), the population under study was defined as the private households in the country. Sampling was based on a stratified random design, with the NUTS2 statistical regions of Europe and the education level of the target respondent as stratum variables. The target sample size was 1000 households per country, with selection probability within each country proportional to stratum size.

Fieldwork

The fieldwork was conducted between December 2018 and April 2019 in ten EU and EEA countries (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, United Kingdom). The target respondent in each household was the person with main or shared responsibility for food shopping in the household. The fieldwork was sub-contracted to a professional research provider (Dynata, formerly Research Now SSI). Complete responses were obtained from altogether 9996 households.

Weights

In addition to the SafeConsume Household Survey data, population data from Eurostat (2019) were used to calculate weights. These were calculated with NUTS2 region as the stratification variable and assigned an influence to each observation in each stratum that was proportional to how many households in the population stratum a household in the sample stratum represented. The weights were used in the estimation of all base rates included in the data set.

Transformations

All survey variables were normalised to the [0,1] range before the analysis. Responses to food frequency questions were transformed into the proportion of all meals consumed during a year where the meal contained the respective food item. Responses to questions with 11-point Juster probability scales as the response format were transformed into numerical probabilities. Responses to questions with time (hours, days, weeks) or temperature (C) as response formats were discretised using supervised binning. The thresholds best separating between the bins were chosen on the basis of five-fold cross-validated decision trees. The binned versions of these variables, and all other input variables with multiple categorical response options (either with a check-all-that-apply or forced-choice response format) were transformed into sets of binary features, with a value 1 assigned if the respective response option had been checked, 0 otherwise.

Treatment of missing values

In many cases, a missing value on a feature logically implies that the respective data point should have a value of zero. If, for example, a participant in the SafeConsume Household Survey had indicated that a particular food was not consumed in their household, the participant was not presented with any other questions related to that food, which automatically results in missing values on all features representing the responses to the skipped questions. However, zero consumption would also imply a zero probability that the respective food is consumed undercooked. In such cases, missing values were replaced with a value of 0.
a
Hypertension (in persons of all ages): England
hub.arcgis.com
data.catchmentbasedapproach.org
Updated Apr 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Rivers Trust (2021). Hypertension (in persons of all ages): England [Dataset]. https://hub.arcgis.com/maps/theriverstrust::hypertension-in-persons-of-all-ages-england
Explore at:
Dataset updated
Apr 7, 2021
Dataset authored and provided by
The Rivers Trust
Area covered

Description
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of hypertension (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to hypertension (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with hypertension was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with hypertension was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with hypertension , within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have hypertension B) the NUMBER of people within that MSOA who are estimated to have hypertension An average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have hypertension , compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from hypertension, and where those people make up a large percentage of the population, indicating there is a real issue with hypertension within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of hypertension, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of hypertension .TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gail E. Potter; Jimmy Wong; Jonathan Sugimoto; Aldiouma Diallo; John C. Victor; Kathleen Neuzil; M. Elizabeth Halloran (2023). Numbers and percentages of participants missing data contributing to the degree calculation. [Dataset]. http://doi.org/10.1371/journal.pone.0220443.t002

Numbers and percentages of participants missing data contributing to the degree calculation.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0220443.t002

Dataset updated

Jun 5, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Gail E. Potter; Jimmy Wong; Jonathan Sugimoto; Aldiouma Diallo; John C. Victor; Kathleen Neuzil; M. Elizabeth Halloran

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Numbers and percentages of participants missing data contributing to the degree calculation.

Clear search

Close search

Google apps

Main menu

Numbers and percentages of participants missing data contributing to the...

Data from: Evaluating Supplemental Samples in Longitudinal Research:...

A Correction for Structural Equation Modeling Fit Indices Under Missingness:...

Young People Survey

Introduction

Research questions

Past research

Questionnaire

MUSIC PREFERENCES

MOVIE PREFERENCES

Datasheet2_Assessing disparities through missing race and ethnicity data:...

‘Young People Survey’ analyzed by Analyst-2

Introduction

Research questions

Past research

Questionnaire

MUSIC PREFERENCES

MOVIE PREFERENCES

HOBBIES & INTERESTS

PHOBIAS

HEALTH HABITS

PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS

Retail Product Dataset with Missing Values

Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021

Number of missing persons files U.S. 2024, by race

Data from: Using decision trees to understand structure in missing data

Levels of obesity, inactivity and associated illnesses (England): Missing...

Experimental Dataset on the Impact of Unfair Behavior by AI and Humans on...

Data From: Multiple imputation for harmonizing longitudinal non-commensurate...

Table_1_Comparison of machine learning and logistic regression as predictive...

NielsenHackathon

Context

Task:

Empathy dataset

Data from: Triple Dissociation Revisited

README

DETAILS FOR ACCESSING DATA

CONTACT PERSON (Corresponding Author)

OVERVIEW

PROJECT NAME

YEARS THAT PROJECT RAN

BRIEF OVERVIEW OF EXPERIMENTAL TASKS

DATASET CONTENTS

INDEPENDENT VARIABLES

DEPENDENT VARIABLES

CONTROL VARIABLES

QUALITY ASSESSMENT OF DATA

METHODS

STUDY PHASE

TEST PHASE

PARTICIPANTS

APPARATUS

INITIAL SETUP

TASK ORGANIZATION

TASK DETAILS

ADDITIONAL DATA ACQUIRED

EXPERIMENTAL LOCATION

MISSING DATA

NOTES

The Science of BDSM Data, Phoenix, Arizona, 2014 - Version 1

Base rates of food safety practices in European households: Summary data...

Hypertension (in persons of all ages): England

Numbers and percentages of participants missing data contributing to the degree calculation.