Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numbers and percentages of participants missing data contributing to the degree calculation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.
Missing data is a frequent occurrence in both small and large datasets. Among other things, missingness may be a result of coding or computer error, participant absences, or it may be intentional, as in a planned missing design. Whatever the cause, the problem of how to approach a dataset with holes is of much relevance in scientific research. First, missingness is approached as a theoretical construct, and its impacts on data analysis are encountered. I discuss missingness as it relates to structural equation modeling and model fit indices, specifically its interaction with the Root Mean Square Error of Approximation (RMSEA). Data simulation is used to show that RMSEA has a downward bias with missing data, yielding skewed fit indices. Two alternative formulas for RMSEA calculation are proposed: one correcting degrees of freedom and one using Kullback-Leibler divergence to result in an RMSEA calculation which is relatively independent of missingness. Simulations are conducted in Java, with results indicating that the Kullback-Leibler divergence provides a better correction for RMSEA calculation. Next, I approach missingness in an applied manner with an existing large dataset examining ideology measures. The researchers assessed ideology using a planned missingness design, resulting in high proportions of missing data. Factor analysis was performed to gauge uniqueness of ideology measures.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In 2013, students of the Statistics class at "https://fses.uniba.sk/en/">FSEV UK were asked to invite their friends to participate in this survey.
responses.csv
) consists of 1010 rows and 150 columns (139
integer and 11 categorical).columns.csv
file if you want to match the data with the original names.The variables can be split into the following groups:
Many different techniques can be used to answer many questions, e.g.
(in slovak) Sleziak, P. - Sabo, M.: Gender differences in the prevalence of specific phobias. Forum Statisticum Slovacum. 2014, Vol. 10, No. 6. [Differences (gender + whether people lived in village/town) in the prevalence of phobias.]
Sabo, Miroslav. Multivariate Statistical Methods with Applications. Diss. Slovak University of Technology in Bratislava, 2014. [Clustering of variables (music preferences, movie preferences, phobias) + Clustering of people w.r.t. their interests.]
IntroductionEnsuring high-quality race and ethnicity data within the electronic health record (EHR) and across linked systems, such as patient registries, is necessary to achieving the goal of inclusion of racial and ethnic minorities in scientific research and detecting disparities associated with race and ethnicity. The project goal was to improve race and ethnicity data completion within the Pediatric Rheumatology Care Outcomes Improvement Network and assess impact of improved data completion on conclusions drawn from the registry.MethodsThis is a mixed-methods quality improvement study that consisted of five parts, as follows: (1) Identifying baseline missing race and ethnicity data, (2) Surveying current collection and entry, (3) Completing data through audit and feedback cycles, (4) Assessing the impact on outcome measures, and (5) Conducting participant interviews and thematic analysis.ResultsAcross six participating centers, 29% of the patients were missing data on race and 31% were missing data on ethnicity. Of patients missing data, most patients were missing both race and ethnicity. Rates of missingness varied by data entry method (electronic vs. manual). Recovered data had a higher percentage of patients with Other race or Hispanic/Latino ethnicity compared with patients with non-missing race and ethnicity data at baseline. Black patients had a significantly higher odds ratio of having a clinical juvenile arthritis disease activity score (cJADAS10) of ≥5 at first follow-up compared with White patients. There was no significant change in odds ratio of cJADAS10 ≥5 for race and ethnicity after data completion. Patients missing race and ethnicity were more likely to be missing cJADAS values, which may affect the ability to detect changes in odds ratio of cJADAS ≥5 after completion.ConclusionsAbout one-third of the patients in a pediatric rheumatology registry were missing race and ethnicity data. After three audit and feedback cycles, centers decreased missing data by 94%, primarily via data recovery from the EHR. In this sample, completion of missing data did not change the findings related to differential outcomes by race. Recovered data were not uniformly distributed compared with those with non-missing race and ethnicity data at baseline, suggesting that differences in outcomes after completing race and ethnicity data may be seen with larger sample sizes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Young People Survey’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/miroslavsabo/young-people-survey on 30 September 2021.
--- Dataset description provided by original source is as follows ---
In 2013, students of the Statistics class at "https://fses.uniba.sk/en/">FSEV UK were asked to invite their friends to participate in this survey.
responses.csv
) consists of 1010 rows and 150 columns (139
integer and 11 categorical).columns.csv
file if you want to match the data with the original names.The variables can be split into the following groups:
Many different techniques can be used to answer many questions, e.g.
(in slovak) Sleziak, P. - Sabo, M.: Gender differences in the prevalence of specific phobias. Forum Statisticum Slovacum. 2014, Vol. 10, No. 6. [Differences (gender + whether people lived in village/town) in the prevalence of phobias.]
Sabo, Miroslav. Multivariate Statistical Methods with Applications. Diss. Slovak University of Technology in Bratislava, 2014. [Clustering of variables (music preferences, movie preferences, phobias) + Clustering of people w.r.t. their interests.]
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).
The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage
This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
As the UK went into the first lockdown of the COVID-19 pandemic, the team behind the biggest social survey in the UK, Understanding Society (UKHLS), developed a way to capture these experiences. From April 2020, participants from this Study were asked to take part in the Understanding Society COVID-19 survey, henceforth referred to as the COVID-19 survey or the COVID-19 study.
The COVID-19 survey regularly asked people about their situation and experiences. The resulting data gives a unique insight into the impact of the pandemic on individuals, families, and communities. The COVID-19 Teaching Dataset contains data from the main COVID-19 survey in a simplified form. It covers topics such as
The resource contains two data files:
Key features of the dataset
A full list of variables in both files can be found in the User Guide appendix.
Who is in the sample?
All adults (16 years old and over as of April 2020), in households who had participated in at least one of the last two waves of the main study Understanding Society, were invited to participate in this survey. From the September 2020 (Wave 5) survey onwards, only sample members who had completed at least one partial interview in any of the first four web surveys were invited to participate. From the November 2020 (Wave 6) survey onwards, those who had only completed the initial survey in April 2020 and none since, were no longer invited to participate
The User guide accompanying the data adds to the information here and includes a full variable list with details of measurement levels and links to the relevant questionnaire.
In 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
Objectives: Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting: Data taken from employees at 3 different industrial sites in Australia. Participants: 7915 observations were included. Materials and methods: The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results: CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion: Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions: Researchers are encouraged to use CART and BRT models to explore and understand missing data.
SUMMARYTo be viewed in combination with the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.This dataset shows where there was no data* relating to one of more of the following factors:Obesity/inactivity-related illnesses (recorded at the GP practice catchment area level*)Adult obesity (recorded at the GP practice catchment area level*)Inactivity in children (recorded at the district level)Excess weight in children (recorded at the Middle Layer Super Output Area level)* GPs do not have catchments that are mutually exclusive from each other: they overlap, with some geographic areas being covered by 30+ practices.GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. This dataset identifies areas where data from 2019/20 was used, where one or more GPs did not submit data in either year (this could be because there are rural areas that aren’t officially covered by any GP practices), or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution.Results of the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ analysis in these areas should be interpreted with caution, particularly if the levels of obesity, inactivity and associated illnesses appear to be significantly lower than in their immediate surrounding areas.Really small areas with ‘missing’ data were deleted, where it was deemed that missing data will not have impacted the overall analysis (i.e. where GP data was missing from really small countryside areas where no people live).See also Health and wellbeing statistics (GP-level, England): Missing data and potential outliers dataDATA SOURCESThis dataset was produced using:- Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.- National Child Measurement Programme: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. - Active Lives Survey 2019: Sport and Physical Activity Levels amongst children and young people in school years 1-11 (aged 5-16). © Sport England 2020.- Active Lives Survey 2019: Sport and Physical Activity Levels amongst adults aged 16+. © Sport England 2020.- GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.- Administrative boundaries: Boundary-LineTM: Contains Ordnance Survey data © Crown copyright and database right 2021. Contains public sector information licensed under the Open Government Licence v3.0.- MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Sport England 2020; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains Ordnance Survey data © Crown copyright and database right 2021. Contains public sector information licensed under the Open Government Licence v3.0.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
This dataset originates from a series of experimental studies titled “Tough on People, Tolerant to AI? Differential Effects of Human vs. AI Unfairness on Trust” The project investigates how individuals respond to unfair behavior (distributive, procedural, and interactional unfairness) enacted by artificial intelligence versus human agents, and how such behavior affects cognitive and affective trust.1 Experiment 1a: The Impact of AI vs. Human Distributive Unfairness on TrustOverview: This dataset comes from an experimental study aimed at examining how individuals respond in terms of cognitive and affective trust when distributive unfairness is enacted by either an artificial intelligence (AI) agent or a human decision-maker. Experiment 1a specifically focuses on the main effect of the “type of decision-maker” on trust.Data Generation and Processing: The data were collected through Credamo, an online survey platform. Initially, 98 responses were gathered from students at a university in China. Additional student participants were recruited via Credamo to supplement the sample. Attention check items were embedded in the questionnaire, and participants who failed were automatically excluded in real-time. Data collection continued until 202 valid responses were obtained. SPSS software was used for data cleaning and analysis.Data Structure and Format: The data file is named “Experiment1a.sav” and is in SPSS format. It contains 28 columns and 202 rows, where each row corresponds to one participant. Columns represent measured variables, including: grouping and randomization variables, one manipulation check item, four items measuring distributive fairness perception, six items on cognitive trust, five items on affective trust, three items for honesty checks, and four demographic variables (gender, age, education, and grade level). The final three columns contain computed means for distributive fairness, cognitive trust, and affective trust.Additional Information: No missing data are present. All variable names are labeled in English abbreviations to facilitate further analysis. The dataset can be directly opened in SPSS or exported to other formats.2 Experiment 1b: The Mediating Role of Perceived Ability and Benevolence (Distributive Unfairness)Overview: This dataset originates from an experimental study designed to replicate the findings of Experiment 1a and further examine the potential mediating role of perceived ability and perceived benevolence.Data Generation and Processing: Participants were recruited via the Credamo online platform. Attention check items were embedded in the survey to ensure data quality. Data were collected using a rolling recruitment method, with invalid responses removed in real time. A total of 228 valid responses were obtained.Data Structure and Format: The dataset is stored in a file named Experiment1b.sav in SPSS format and can be directly opened in SPSS software. It consists of 228 rows and 40 columns. Each row represents one participant’s data record, and each column corresponds to a different measured variable. Specifically, the dataset includes: random assignment and grouping variables; one manipulation check item; four items measuring perceived distributive fairness; six items on perceived ability; five items on perceived benevolence; six items on cognitive trust; five items on affective trust; three items for attention check; and three demographic variables (gender, age, and education). The last five columns contain the computed mean scores for perceived distributive fairness, ability, benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be analyzed directly in SPSS or exported to other formats as needed.3 Experiment 2a: Differential Effects of AI vs. Human Procedural Unfairness on TrustOverview: This dataset originates from an experimental study aimed at examining whether individuals respond differently in terms of cognitive and affective trust when procedural unfairness is enacted by artificial intelligence versus human decision-makers. Experiment 2a focuses on the main effect of the decision agent on trust outcomes.Data Generation and Processing: Participants were recruited via the Credamo online survey platform from two universities located in different regions of China. A total of 227 responses were collected. After excluding those who failed the attention check items, 204 valid responses were retained for analysis. Data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2a.sav in SPSS format and can be directly opened in SPSS software. It contains 204 rows and 30 columns. Each row represents one participant’s response record, while each column corresponds to a specific variable. Variables include: random assignment and grouping; one manipulation check item; seven items measuring perceived procedural fairness; six items on cognitive trust; five items on affective trust; three attention check items; and three demographic variables (gender, age, and education). The final three columns contain computed average scores for procedural fairness, cognitive trust, and affective trust.Additional Notes: The dataset contains no missing values. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be directly analyzed in SPSS or exported to other formats as needed.4 Experiment 2b: Mediating Role of Perceived Ability and Benevolence (Procedural Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 2a and to further examine the potential mediating roles of perceived ability and perceived benevolence in shaping trust responses under procedural unfairness.Data Generation and Processing: Participants were working adults recruited through the Credamo online platform. A rolling data collection strategy was used, where responses failing attention checks were excluded in real time. The final dataset includes 235 valid responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2b.sav, which is in SPSS format and can be directly opened using SPSS software. It contains 235 rows and 43 columns. Each row corresponds to a single participant, and each column represents a specific measured variable. These include: random assignment and group labels; one manipulation check item; seven items measuring procedural fairness; six items for perceived ability; five items for perceived benevolence; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final five columns contain the computed average scores for procedural fairness, perceived ability, perceived benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to support future reuse and secondary analysis. The dataset can be directly analyzed in SPSS and easily converted into other formats if needed.5 Experiment 3a: Effects of AI vs. Human Interactional Unfairness on TrustOverview: This dataset comes from an experimental study that investigates how interactional unfairness, when enacted by either artificial intelligence or human decision-makers, influences individuals’ cognitive and affective trust. Experiment 3a focuses on the main effect of the “decision-maker type” under interactional unfairness conditions.Data Generation and Processing: Participants were college students recruited from two universities in different regions of China through the Credamo survey platform. After excluding responses that failed attention checks, a total of 203 valid cases were retained from an initial pool of 223 responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3a.sav, in SPSS format and compatible with SPSS software. It contains 203 rows and 27 columns. Each row represents a single participant, while each column corresponds to a specific measured variable. These include: random assignment and condition labels; one manipulation check item; four items measuring interactional fairness perception; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final three columns contain computed average scores for interactional fairness, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variable names are provided using standardized English abbreviations to facilitate secondary analysis. The data can be directly analyzed using SPSS and exported to other formats as needed.6 Experiment 3b: The Mediating Role of Perceived Ability and Benevolence (Interactional Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 3a and further examine the potential mediating roles of perceived ability and perceived benevolence under conditions of interactional unfairness.Data Generation and Processing: Participants were working adults recruited via the Credamo platform. Attention check questions were embedded in the survey, and responses that failed these checks were excluded in real time. Data collection proceeded in a rolling manner until a total of 227 valid responses were obtained. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3b.sav, in SPSS format and compatible with SPSS software. It includes 227 rows and
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta-analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials, and use multiple imputation to fill in missing measurements. We apply our method to 5 longitudinal adolescent depression trials where 4 studies used one depression measure and the fifth study used a different depression measure. None of the 5 studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigating whether external information is appropriately incorporated into the imputed values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionPreeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning algorithms demonstrate promising potential, while there is a controversial discussion about whether machine learning methods should be recommended preferably, compared to traditional statistical models.MethodsWe employed both logistic regression and six machine learning methods as binary predictive models for a dataset containing 733 women diagnosed with preeclampsia. Participants were grouped by four different pregnancy outcomes. After the imputation of missing values, statistical description and comparison were conducted preliminarily to explore the characteristics of documented 73 variables. Sequentially, correlation analysis and feature selection were performed as preprocessing steps to filter contributing variables for developing models. The models were evaluated by multiple criteria.ResultsWe first figured out that the influential variables screened by preprocessing steps did not overlap with those determined by statistical differences. Secondly, the most accurate imputation method is K-Nearest Neighbor, and the imputation process did not affect the performance of the developed models much. Finally, the performance of models was investigated. The random forest classifier, multi-layer perceptron, and support vector machine demonstrated better discriminative power for prediction evaluated by the area under the receiver operating characteristic curve, while the decision tree classifier, random forest, and logistic regression yielded better calibration ability verified, as by the calibration curve.ConclusionMachine learning algorithms can accomplish prediction modeling and demonstrate superior discrimination, while Logistic Regression can be calibrated well. Statistical analysis and machine learning are two scientific domains sharing similar themes. The predictive abilities of such developed models vary according to the characteristics of datasets, which still need larger sample sizes and more influential predictors to accumulate evidence.
Create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.
Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.
The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.
Size: A dataset of size 1973*28
Number of features: 28
Ground truth: No
Type of Graph: Mixed graph
The following gives the description of the variables:
Feature | FeatureLabel | Domain | Item meaning from Davis 1980 |
---|---|---|---|
001 | 1FS | Green | I daydream and fantasize, with some regularity, about things that might happen to me. |
002 | 2EC | Purple | I often have tender, concerned feelings for people less fortunate than me. |
003 | 3PT_R | Yellow | I sometimes find it difficult to see things from the “other guy’s” point of view. |
004 | 4EC_R | Purple | Sometimes I don’t feel very sorry for other people when they are having problems. |
005 | 5FS | Green | I really get involved with the feelings of the characters in a novel. |
006 | 6PD | Red | In emergency situations, I feel apprehensive and ill-at-ease. |
007 | 7FS_R | Green | I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed) |
008 | 8PT | Yellow | I try to look at everybody’s side of a disagreement before I make a decision. |
009 | 9EC | Purple | When I see someone being taken advantage of, I feel kind of protective towards them. |
010 | 10PD | Red | I sometimes feel helpless when I am in the middle of a very emotional situation. |
011 | 11PT | Yellow | sometimes try to understand my friends better by imagining how things look from their perspective |
012 | 12FS_R | Green | Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed) |
013 | 13PD_R | Red | When I see someone get hurt, I tend to remain calm. (Reversed) |
014 | 14EC_R | Purple | Other people’s misfortunes do not usually disturb me a great deal. (Reversed) |
015 | 15PT_R | Yellow | If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed) |
016 | 16FS | Green | After seeing a play or movie, I have felt as though I were one of the characters. |
017 | 17PD | Red | Being in a tense emotional situation scares me. |
018 | 18EC_R | Purple | When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed) |
019 | 19PD_R | Red | I am usually pretty effective in dealing with emergencies. (Reversed) |
020 | 20FS | Green | I am often quite touched by things that I see happen. |
021 | 21PT | Yellow | I believe that there are two sides to every question and try to look at them both. |
022 | 22EC | Purple | I would describe myself as a pretty soft-hearted person. |
023 | 23FS | Green | When I watch a good movie, I can very easily put myself in the place of a leading character. |
024 | 24PD | Red | I tend to lose control during emergencies. |
025 | 25PT | Yellow | When I’m upset at someone, I usually try to “put myself in his shoes” for a while. |
026 | 26FS | Green | When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me. |
027 | 27PD | Red | When I see someone who badly needs help in an emergency, I go to pieces. |
028 | 28PT | Yellow | Before criticizing somebody, I try to imagine how I would feel if I were in their place |
More information about the dataset is contained in empathy_description.html file.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
C. Brock Kirwan 1001 KMBL, Brigham Young University, Provo, UT 84602 Email: kirwan@byu.edu Phone: 801-422-2532 Fax: 801-422-0602 ORCID ID: 0000-003-0768-1446
Limited Evidence for a Triple Dissociation in the Medial Temporal Lobe: an fMRI Recognition Memory Replication Study
2020-2021
The present experiment aims to replicate two previous papers (cited below) in which authors present two analysis paths for a dataset in which participants underwent fMRI while performing a recognition memory test for old and new words. Both studies found activation in the hippocampus, with the first (Daselaar, Fleck, & Cabeza, 2006) demonstrating a distinction in hippocampus activation corresponding to true and perceived oldness of stimuli and the second (Daselaar, Fleck, Prince, & Cabeza, 2006) demonstrating that hippocampus activation reflects the subjective experience of the participant.
We replicated behavioral and MRI acquisition parameters reported in these two target articles with N=53 participants and focused fMRI analyses on regions of interest reported in those articles looking at fMRI activation for differences corresponding with true and perceived oldness and those associated with subjective memory experiences of recollection, familiarity, and novelty.
References: (1) Daselaar, S. M., Fleck, M. S., & Cabeza, R. (2006). Triple dissociation in the medial temporal lobes: Recollection, familiarity, and novelty. J Neurophysiol, 96(4), 1902–1911. https://doi.org/10.1152/jn.01029.2005 (2) Daselaar, S. M., Fleck, M. S., Prince, S. E., & Cabeza, R. (2006). The medial temporal lobe distinguishes old from new independently of consciousness. J Neurosci, 26(21), 5835–5839. https://doi.org/26/21/5835 [pii] 10.1523/JNEUROSCI.0258-06.2006
This dataset includes raw data from all scanned participants acquired by the Siemens Trio 3T MRI scanner (12-channel head coil), with each participant consisting of the following folders: /anat, /fmap, and /func. /anat includes structural imaging data obtained from scanning in the form of .nii.gz and .json files. /fmap includes field mapping data in the form of .nii.gz and .json files. /func includes functional imaging data obtained from scanning in the form of .nii.gz and .json files, along with event.tsv files for each run (total runs = 4). Data for a total of N=53 participants is included in the present dataset.
True vs Perceived Oldness: Mean activity (mean parameter estimates) for each individual trial in the anterior/posterior MTL regions were identified by true oldness and perceived novelty contrasts. These resulting values were entered into a logistic regression model with activations in the MTL regions set as independent variables. Subjective Confidence: Mean activity for each individual trial from different MTL regions were identified and entered into a multiple regression model based on activations in different MTL regions (i.e., recollection-related activity, familiarity-related activity, and novelty-related activity) as independent variables.
True vs Perceived Oldness: A binary variable reflecting whether participants correctly recognized an old item as old (hit) or incorrectly classified an old item as new (miss) were set as the dependent variable. Subjective Confidence: 6-point oldness scale was entered as the dependent variable.
N/A
Data were preprocessed, which included spatial motion correction and spatial normalization that was automatically generated by the fMRIPrep software. Following fMRIPrep preprocessing, functional data were scaled with a mean of 100 and blurred with an 8 mm FWHM Gaussian kernel to account for inter-subject anatomical variation. Analysis scripts are available here: https://osf.io/ctvsw/. Data was acquired for N=60 participants, with data from n=7 participants excluded for reasons of ineligibility (left-handedness, n=1), failure to comply with study procedures (n=2), excessive motion (n=3), and equipment error (n=1).
In our experimental task, participants completed a study phase in which they were presented with a randomized list of 120 real English words and 80 pseudo words at a rate of 2000 ms per item. A fixation cross was presented between words for a random time interval varying between 0-5500 ms, where participants indicated whether the stimulus presented was a word or pseudo word. They were not informed at this time that their memory for the words would be tested. After the completion of the study phase, researchers situated participants in the MRI scanner and obtained localizer, field map, and T1-weighted structural MRI scans before initiating the test phase of the experiment.
During the test phase, a task paradigm was presented as four experimental runs lasting between 435-442 seconds. Participants saw an equal number of target stimuli (words shown during the study phase) and foil stimuli (novel words) at 60 words per run. Target and foil stimuli were presented in a randomized order at 3.4 seconds. Participants were asked to make judgments on whether the word was presented on the study list while the stimulus was displayed. Confidence ratings were collected for those judgments between true and perceived oldness of stimuli from 1 (lowest confidence) to 4 (highest confidence), with a prompt displayed for 1.7 seconds.
Recruitment: To determine sample size, an a prior power analysis was done by extracting values from Figure 1 of (Daselaar, Fleck, Prince, et al., 2006) in the right hippocampus via Web Plot Digitizer, given that the region showed smaller differences. We computed main effects by averaging hits and misses, and CRs and FAs prior to SEM to SD conversion and averaging again. Resulting values were entered into g+power to estimate an effect size of 0.46, indicating that a sample of N=54 would achieve a power of 0.95 with an error probability of 0.05 (t(1,53)=1.67). Participants were recruited from the campus community and met MRI compliance screening requirements. Exclusion: Non-native English speakers, history of drug use, previous psychiatric or neurologic diagnosis, or contra-indications for MRI (e.g., ferromagnetic implant). Compensation: Participants were compensated for participation with a choice of $20, course credit, or a 3D-printed 1/4-scale model of their brains.
Localizer, field map, and T1-weighted structural MRI scans were obtained once the participants were situated in the scanner. MRI data were collected using a Siemens Trio 3T MRI scanner (12-channel head coil) and behavioral responses were collected using a four-key fiber-optic response cylinder (Current Designs, n.d.). Structural scanning was done at the beginning of the scan session (256 x 215 matrix, TR 1900 ms, TE 2.26 ms, FOV 250 x 218 mm, 176 slices, 1 mm slice thickness, 0 mm spacing) and functional scanning was done during all experimental runs (64 x 64 image matrix, TR 1800 ms, TE 31 ms, FOV 240 mm, 34 slices, 3.8 mm slice thickness). An MR-compatible LCD monitor displayed stimuli from the head of the bore, which participants viewed through a mirror mounted on the head coil. MRI data are available at: https://openneuro.org/datasets/ds004086.
[See above under STUDY PHASE and TEST PHASE for procedures performed once the participant arrived.]
Behavioral and imaging data were collected for each participant through the course of four (4) experimental runs. Behavioral data was used to create event.tsv files for each participant per run, indicating the onset, duration, trial type, stimuli response, correct answer, and reaction times of responses. Each experimental run lasted between 435-442 seconds, where participants saw an equal number of target stimuli (words shown in the study phase) and foil stimuli (novel words) at 60 words per run.
Stimuli were presented for 3.4 seconds, where participants were asked to make judgments indicating whether the word was presented on the study list while the stimulus was displayed. Confidence ratings were then collected for those judgments between true and perceived oldness of stimuli from 1 (lowest confidence) to 4 (highest confidence). Prompt for the confidence ratings was displayed for 1.7 seconds, with each trial separated by an inter-trial interval (ITI) consisting of a fixation cross with a randomly distributed duration of 0-5.4 seconds (mean ITI=2.7 seconds).
Behavioral data were identified as hits, misses, correct rejections (CRs), and false alarms (FAs). Hits indicated correct judgments of “old” for words that were actually old. Misses reflected incorrect judgments of “old” for words that were actually new. Correct rejections indicated correct judgments of “new” for new words, and false alarms represented incorrect judgments of “new” for old words.
The study was performed in the MRI Research Facility at the Brigham Young University campus in Provo, UT.
The following subjects may be missing data and/or are not included in analyses for the following reasons: Sub-001: Ineligible; left-handedness Sub-005: Failure to comply; completed only 10% of entries compared to other subjects Sub-026: Excessive motion Sub-034: Failure to comply; did not provide a response other than a “1” or none Sub-050: Excessive motion Sub-052: Excessive motion Sub-056: Equipment error
Sub-054 restarted their testing and completed the study protocol in full in the latter session.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de688050https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de688050
Abstract (en): The goals of this study were to test whether participants who engaged in an extreme ritual in a naturalistic setting would evidence signs of altered states of consciousness, to examine other physiological and affective effects of the ritual, and to determine whether these effects varied based on the role the individual performed within the ritual. A multi-method approach was used that utilized various psychological self-report measures, a measure of cognitive functioning, and a measure of physiological stress. The data collection took place at the "Dance of Souls," a ritual conducted on the last day of the annual Southwest Leather Conference in Phoenix, Arizona, in which participants received temporary piercings with hooks or weights attached to the piercings and danced to music provided by drummers. The associated publication, Altered States of Consciousness during an Extreme Ritual, was used to accompany the data in this collection. Users are encouraged to consult the publication for additional information. The data collection includes one de-identified dataset with 164 variables for 83 cases. Demographic variables include sex, gender, pierced vs. non-pierced, and the role the participant played in the ceremony. The goals of this study were to test whether participants engaged in an extreme ritual in a naturalistic setting would evidence signs of altered states of consciousness, to examine other physiological and affective effects of the ritual, and to determine whether these effects varied based on the role the individual performed within the ritual. Data collection took place at the 2014 "Dance of Souls," a ritual conducted on the last day of the annual Southwest Leather Conference in Phoenix, Arizona. A mixed-methods approach was utilized where participants completed repeated measures of positive and negative affect, salivary cortisol (a hormone associated with stress), self-reported stress, sexual arousal, and intimacy; Stroop test scores were also collected. Conference attendees could enroll in the study at any point until an hour prior to the beginning of the dance. Measures were taken before the dance (baseline), during the dance, and after the dance. Not all participants completed the materials in full during data collection and many were missing at least some data. To rectify this, three months after the conference, conference organizers sent an email to all the dance attendees with a link to an online version of the surveys. The goals were to (a) collect additional information from existing participants, (b) allow existing participants to complete any missing surveys, and (c) allow new participants to fill out the pre and post-dance surveys. If existing participants filled out a duplicate version of the pre- or post-dance survey, their responses were averaged in the dataset. It was not possible to collect Stroop and saliva samples in this manner. Variables in the data collection include:
Demographic: gender/sex, role in ritual, pierced/non-pierced; Participant experience/skill with BDSM; Inclusion of Other in the Self (IOS) scale variables; Positive and Negative Affect Schedule (PANAS) scale variables; Self-reported measures related to the ritual; Flow State Scale (FSS) variables; Psychological and Physiological measure before, during, and after the ritual; ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.; Checked for undocumented or out-of-range codes.. Presence of Common Scales: The Flow State Scale (FSS); Positive and Negative Affect Schedule (PANAS); Inclusion of Other in Self (IOS) Scale; Stroop Effect tests. Datasets:DS1: The Science of BDSM Data, Phoenix, Arizona, 2014 Participants of the Dance of Souls ritual, on the final day of the 2014 Southwest Leather Conference (SWLC) in Phoenix, Arizona. Smallest Geographic Unit: None The Dance of Souls took place in a large ballroom on the final day of the 2014 Southwest Leather Conference (SWLC), an annual four-day conference in Phoenix, Arizona. Approximately 180 people participated in the Dance of Souls. Of these, 83 enrolled in the ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains estimates of the base rates of 550 food safety-relevant food handling practices in European households. The data are representative for the population of private households in the ten European countries in which the SafeConsume Household Survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK).
Sampling design
In each of the ten EU and EEA countries where the survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK), the population under study was defined as the private households in the country. Sampling was based on a stratified random design, with the NUTS2 statistical regions of Europe and the education level of the target respondent as stratum variables. The target sample size was 1000 households per country, with selection probability within each country proportional to stratum size.
Fieldwork
The fieldwork was conducted between December 2018 and April 2019 in ten EU and EEA countries (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, United Kingdom). The target respondent in each household was the person with main or shared responsibility for food shopping in the household. The fieldwork was sub-contracted to a professional research provider (Dynata, formerly Research Now SSI). Complete responses were obtained from altogether 9996 households.
Weights
In addition to the SafeConsume Household Survey data, population data from Eurostat (2019) were used to calculate weights. These were calculated with NUTS2 region as the stratification variable and assigned an influence to each observation in each stratum that was proportional to how many households in the population stratum a household in the sample stratum represented. The weights were used in the estimation of all base rates included in the data set.
Transformations
All survey variables were normalised to the [0,1] range before the analysis. Responses to food frequency questions were transformed into the proportion of all meals consumed during a year where the meal contained the respective food item. Responses to questions with 11-point Juster probability scales as the response format were transformed into numerical probabilities. Responses to questions with time (hours, days, weeks) or temperature (C) as response formats were discretised using supervised binning. The thresholds best separating between the bins were chosen on the basis of five-fold cross-validated decision trees. The binned versions of these variables, and all other input variables with multiple categorical response options (either with a check-all-that-apply or forced-choice response format) were transformed into sets of binary features, with a value 1 assigned if the respective response option had been checked, 0 otherwise.
Treatment of missing values
In many cases, a missing value on a feature logically implies that the respective data point should have a value of zero. If, for example, a participant in the SafeConsume Household Survey had indicated that a particular food was not consumed in their household, the participant was not presented with any other questions related to that food, which automatically results in missing values on all features representing the responses to the skipped questions. However, zero consumption would also imply a zero probability that the respective food is consumed undercooked. In such cases, missing values were replaced with a value of 0.
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of hypertension (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to hypertension (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with hypertension was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with hypertension was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with hypertension , within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have hypertension B) the NUMBER of people within that MSOA who are estimated to have hypertension An average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have hypertension , compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from hypertension, and where those people make up a large percentage of the population, indicating there is a real issue with hypertension within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of hypertension, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of hypertension .TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numbers and percentages of participants missing data contributing to the degree calculation.