Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.
The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.
Methodology
To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).
These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.
To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.
Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.
Description of the data in this data set
Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies
The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information
Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet
2) Complete reference - the complete source information to refer to the study
3) Year of publication - the year in which the study was published
4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter}
5) DOI / Website- a link to the website where the study can be found
6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science
7) Availability in OA - availability of an article in the Open Access
8) Keywords - keywords of the paper as indicated by the authors
9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}
Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?
Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)?
18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))
HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term?
20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output")
21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description)
22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles?
23) Data - what data do HVD cover?
24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)
Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx
Licenses or restrictions CC-BY
For more info, see README.txt
This dataset has an access level Restricted, which means it is not available via direct download but must be requested. Our accessing research data guidance outlines the reasons access may be limited and the request process. In order to request access to this data please complete the data request form.*
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data relates to the paper that analyses the determinants or factors that best explain student research skills and success in the honours research report module during the COVID-19 pandemic in 2021. The data used have been gathered through an online survey created on the Qualtrics software package. The research questions were developed from demographic factors and subject knowledge including assignments to supervisor influence and other factors in terms of experience or belonging that played a role (see anonymous link at https://unisa.qualtrics.com/jfe/form/SV_86OZZOdyA5sBurY. An SMS was sent to all students of the 2021 module group to make them aware of the survey. They were under no obligation to complete it and all information was regarded as anonymous. We received 39 responses. The raw data from the survey was processed through the SPSS statistical, software package. The data file contains the demographics, frequencies, descriptives, and open questions processed.
The study reported in this paper employed the mixed methods approach comprising a quantitative and qualitative analysis. The quantitative and econometric analysis of the dependent variable, namely, the final marks for the research report and the independent variables that explain it. The results show significance in terms of the assignments and existing knowledge marks in terms of their bachelor's average mark. We extended the analysis to a qualitative and quantitative survey, which indicated that the mean statistical feedback was above average and therefore strongly agreed/agreed except for library use by the student. Students, therefore, need more guidance in terms of library use and the open questions showed a need for a research methods course in the future. Furthermore, supervision tends to be a significant determinant in all cases. It is also here where supervisors can use social media instruments such as WhatsApp and Facebook to inform students further. This study contributes as the first to investigate the preparation and research skills of students for master's and doctoral studies during the COVID-19 pandemic in an online environment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets and code for Big Meaning: Qualitative Analysis on Large Bodies of Data Using AI
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Meaning in Life of Chinese College Students: A Mixed-Methods Study(quantitative data )
https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the Publisher's Website. Data Generation The research project engages a story about perceptions of fairness in criminal justice decisions. The specific focus involves a debate between ProPublica, a news organization, and Northpointe, the owner of a popular risk tool called COMPAS. ProPublica wrote that COMPAS was racist against blacks, while Northpointe posted online a reply rejecting such a finding. These two documents were the obvious foci of the qualitative analysis because of the further media attention they attracted, the confusion their competing conclusions caused readers, and the power both companies wield in public circles. There were no barriers to retrieval as both documents have been publicly available on their corporate websites. This public access was one of the motivators for choosing them as it meant that they were also easily attainable by the general public, thus extending the documents’ reach and impact. Additional materials from ProPublica relating to the main debate were also freely downloadable from its website and a third party, open source platform. Access to secondary source materials comprising additional writings from Northpointe representatives that could assist in understanding Northpointe’s main document, though, was more limited. Because of a claim of trade secrets on its tool and the underlying algorithm, it was more difficult to reach Northpointe’s other reports. Nonetheless, largely because its clients are governmental bodies with transparency and accountability obligations, some of Northpointe-associated reports were retrievable from third parties who had obtained them, largely through Freedom of Information Act queries. Together, the primary and (retrievable) secondary sources allowed for a triangulation of themes, arguments, and conclusions. The quantitative component uses a dataset of over 7,000 individuals with information that was collected and compiled by ProPublica and made available to the public on github. ProPublica’s gathering the data directly from criminal justice officials via Freedom of Information Act requests rendered the dataset in the public domain, and thus no confidentiality issues are present. The dataset was loaded into SPSS v. 25 for data analysis. Data Analysis The qualitative enquiry used critical discourse analysis, which investigates ways in which parties in their communications attempt to create, legitimate, rationalize, and control mutual understandings of important issues. Each of the two main discourse documents was parsed on its own merit. Yet the project was also intertextual in studying how the discourses correspond with each other and to other relevant writings by the same authors. Several more specific types of discursive strategies were of interest in attracting further critical examination: Testing claims and rationalizations that appear to serve the speaker’s self-interest Examining conclusions and determining whether sufficient evidence supported them Revealing contradictions and/or inconsistencies within the same text and intertextually Assessing strategies underlying justifications and rationalizations used to promote a party’s assertions and arguments Noticing strategic deployment of lexical phrasings, syntax, and rhetoric Judging sincerity of voice and the objective consideration of alternative perspectives Of equal importance in a critical discourse analysis is consideration of what is not addressed, that is to uncover facts and/or topics missing from the communication. For this project, this included parsing issues that were either briefly mentioned and then neglected, asserted yet the significance left unstated, or not suggested at all. This task required understanding common practices in the algorithmic data science literature. The paper could have been completed with just the critical discourse analysis. However, because one of the salient findings from it highlighted that the discourses overlooked numerous definitions of algorithmic fairness, the call to fill this gap seemed obvious. Then, the availability of the same dataset used by the parties in conflict, made this opportunity more appealing. Calculating additional algorithmic equity equations would not thereby be troubled by irregularities because of diverse sample sets. New variables were created as relevant to calculate algorithmic fairness equations. In addition to using various SPSS Analyze functions (e.g., regression, crosstabs, means), online statistical calculators were useful to compute z-test comparisons of proportions and t-test comparisons of means. Logic of Annotation Annotations were employed to fulfil a variety of functions, including supplementing the main text with context, observations, counter-points, analysis, and source attributions. These fall under a few categories. Space considerations. Critical discourse analysis offers a rich method...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundLimited research addressed links between nutritional status, dietary habits, and cognitive functions in young children. This study assessed the status of cognitive functions and their association with nutritional status and dietary habits of school age children of Bangladesh.MethodsThis cross-sectional multi-centre study was conducted on 776 participants in 11 conveniently selected educational institutions. A printed questionnaire with three sections (Section 1: background information, section 2: PedsQL™ Cognitive Functioning Scale, and section 3: semi-quantitative food-frequency questionnaire) was utilized for the data collection purpose. Sections 1 and 3 were self-reported by parents, and trained volunteers completed section 2 in-person along with the anthropometric measurements. Statistical analyses were done in Stata (v.16). Mean with standard deviation and frequencies with percentages were used to summarize quantitative and qualitative variables, respectively. Pearson’s chi-square test and Spearman’s rank correlation coefficient were used to explore bivariate relationships.ResultsThe mean age of the participants was 12.02±1.88 years, and the majority (67%) were females. The prevalence of poor cognitive function was 46.52%, and among them, 66.02% were females. In terms of body mass index (BMI), 22.44% possessed normal weight, 17.51% were overweight, and 5.19% were obese. This study found a statistically significant relationship between BMI and cognitive functions. Furthermore, different dietary components (e.g., protein, carbohydrate, fat, fiber, iron, magnesium) showed a significant (p
DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments. As part of the development of the DEEPEN 3D play fairway analysis (PFA) methodology for magmatic plays (conventional hydrothermal, superhot EGS, and supercritical), weights needed to be developed for use in the weighted sum of the different favorability index models produced from geoscientific exploration datasets. This was done using two different approaches: one based on expert opinions, and one based on statistical learning. This GDR submission includes the datasets used to produce the statistical learning-based weights. While expert opinions allow us to include more nuanced information in the weights, expert opinions are subject to human bias. Data-centric or statistical approaches help to overcome these potential human biases by focusing on and drawing conclusions from the data alone. The drawback is that, to apply these types of approaches, a dataset is needed. Therefore, we attempted to build comprehensive standardized datasets mapping anomalies in each exploration dataset to each component of each play. This data was gathered through a literature review focused on magmatic hydrothermal plays along with well-characterized areas where superhot or supercritical conditions are thought to exist. Datasets were assembled for all three play types, but the hydrothermal dataset is the least complete due to its relatively low priority. For each known or assumed resource, the dataset states what anomaly in each exploration dataset is associated with each component of the system. The data is only a semi-quantitative, where values are either high, medium, or low, relative to background levels. In addition, the dataset has significant gaps, as not every possible exploration dataset has been collected and analyzed at every known or suspected geothermal resource area, in the context of all possible play types. The following training sites were used to assemble this dataset: - Conventional magmatic hydrothermal: Akutan (from AK PFA), Oregon Cascades PFA, Glass Buttes OR, Mauna Kea (from HI PFA), Lanai (from HI PFA), Mt St Helens Shear Zone (from WA PFA), Wind River Valley (From WA PFA), Mount Baker (from WA PFA). - Superhot EGS: Newberry (EGS demonstration project), Coso (EGS demonstration project), Geysers (EGS demonstration project), Eastern Snake River Plain (EGS demonstration project), Utah FORGE, Larderello, Kakkonda, Taupo Volcanic Zone, Acoculco, Krafla. - Supercritical: Coso, Geysers, Salton Sea, Larderello, Los Humeros, Taupo Volcanic Zone, Krafla, Reyjanes, Hengill. **Disclaimer: Treat the supercritical fluid anomalies with skepticism. They are based on assumptions due to the general lack of confirmed supercritical fluid encounters and samples at the sites included in this dataset, at the time of assembling the dataset. The main assumption was that the supercritical fluid in a given geothermal system has shared properties with the hydrothermal fluid, which may not be the case in reality. Once the datasets were assembled, principal component analysis (PCA) was applied to each. PCA is an unsupervised statistical learning technique, meaning that labels are not required on the data, that summarized the directions of variance in the data. This approach was chosen because our labels are not certain, i.e., we do not know with 100% confidence that superhot resources exist at all the assumed positive areas. We also do not have data for any known non-geothermal areas, meaning that it would be challenging to apply a supervised learning technique. In order to generate weights from the PCA, an analysis of the PCA loading values was conducted. PCA loading values represent how much a feature is contributing to each principal component, and therefore the overall variance in the data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets are part of a research on open data in the municipality of Delft. The research is focused on motivation perspectives of citizens to engage in democratic processes by using open data. By doing this, the municipality can adapt their policy on open data to the characteristics of (potential) users and the wishes of the citizens. To identify these motivation perspectives, Q-methodology was used. A survey was used that asks participants to rank a number of statements. The survey that is made is spread among citizens of Delft. In total, 22 people participated in the survey. The gathered data is used to conduct a factor analysis and identify motivation perspectives among citizens of Delft. These datasets contain the gathered Q-sorts and the conducted analyses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project summary: Survey and interview data were collected from relevant stakeholders to investigate the effectiveness of physical activity referral pathways. The research questions explored the views of the participants on key determinants of physical activity (PA) and physical activity referral schemes (PARS) promotion. The factors explored included participants’ knowledge, beliefs, behaviours, perceptions and recommendations about PA and PARS. The research was conducted in three stages: The first stage involved two systematic reviews that investigated the global views of patients and HCPs regarding the promotion of PA and PARS. The findings from this stage informed the need for the second (mixed methods studies) and third (qualitative study) stages of the research, which involved in-depth investigations of the perspectives of PARS stakeholders on their experiences of the functionality of PARS within an Australian context. For these two stages of the research, participants included Australian GPs, EPs and patients with chronic disease(s), aged 18 years and above. A sequential explanatory mixed methods research design that included quantitative online surveys and qualitative telephone interviews was adopted for the two mixed methods studies conducted in stage two. The first mixed methods study explored patients’ views on the efficacy of PARS programmes. The second mixed methods study investigated the perspectives of HCPs (GPs and EPs) on the coordination of care for PARS users. Descriptive statistics including frequencies, percentages, means and standard deviations were used to analyse the demographic characteristics of participants. Shapiro Wilk’s test, an inspection of histograms and q-q plots were used to test for normality. Non-parametric statistical tests including Mann Whitney U and Kruskal Wallis tests were used to compare the relationships between variables. The data were presented as frequencies and means ± SD, with an alpha value of 0.05. Framework analysis was employed for the synthesis of the stage two qualitative data. To increase the credibility and validity of the findings in stage two, the results from both strands of each of the two mixed methods studies were triangulated. In stage three, a qualitative pluralistic evaluation approach was utilised to explore and synthesise the recommendations of all stakeholders (GPs, EPs and patients) on how to enhance the effectiveness of the PARS programme.
This dataset consists of the survey data for general practitioners (GPs) and exercise physiologists (EPs)
Software/equipment used to create/collect the data: Survey data was analysed using SPSS version 27.0 (IBM Inc, Chicago IL).
Variable labels and data coding are explained in the variable view of the attached SPSS file and in the Codebook (PDF) provided.
The full methodology is available in the Open Access publication (PLoS) from the Related publications link below.
The systematic reviews and other publications relating to the patient surveys are also available from the links provided.Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective of this study is to obtain estimates of the economically recoverable gas in the Appalachian Basin. The estimates were obtained in terms of a probability distribution, which quantifies the inherent uncertainty associated with estimates where geologic and production uncertainties prevail. It is established that well productivity on a county and regional basis is lognormally distributed, and the total recoverable gas is Normally distributed. The expected (mean), total economically recoverable gas is 20.2 trillion cubic feet (TCF) with a standard deviation of 1.6 TCF, conditional on the use of shooting technology on 160-acre well-spacing. From properties of the Normal distribution, it is seen that a 95 percent probability exists for the total recoverable gas to lie between 17.06 and 23.34 TCF. The estimates are sensitive to well spacings and the technology applied to a particular geologic environment. It is observed that with smaller well spacings - for example, at 80 acres - the estimate is substantially increased, and that advanced technology, such as foam fracturing, has the potential of significantly increasing gas recovery. However, the threshold and optimum conditions governing advanced exploitation technology, based on well spacing and other parameters, were not analyzed in this study. Their technological impact on gas recovery is mentioned in the text where relevant; and on the basis of a rough projection an additional 10 TCF could be expected with the use of foam fracturing on wells with initial open flows lower than 300 MCFD. From the exploration point of view, the lognormal distribution of well productivity suggests that even in smaller areas, such as a county basis, intense exploration might be appropriate. This is evident from the small tail probabilities of the lognormal distribution, which represent the small number of wells with relatively very high productivity.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).
This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.
Some background information for these four wilderness areas: Neota (area 2) probably has the highest mean elevational value of the 4 wilderness areas. Rawah (area 1) and Comanche Peak (area 3) would have a lower mean elevational value, while Cache la Poudre (area 4) would have the lowest mean elevational value.
As for primary major tree species in these areas, Neota would have spruce/fir (type 1), while Rawah and Comanche Peak would probably have lodgepole pine (type 2) as their primary species, followed by spruce/fir and aspen (type 5). Cache la Poudre would tend to have Ponderosa pine (type 3), Douglas-fir (type 6), and cottonwood/willow (type 4).
The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc.) Cache la Poudre would probably be more unique than the others, due to its relatively low elevation range and species composition.
Given is the attribute name, attribute type, the measurement unit and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database.
Name / Data Type / Measurement / Description
Elevation / quantitative /meters / Elevation in meters Aspect / quantitative / azimuth / Aspect in degrees azimuth Slope / quantitative / degrees / Slope in degrees Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to nearest surface water features Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to nearest surface water features Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to nearest roadway Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am, summer solstice Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon, summer soltice Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm, summer solstice Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to nearest wildfire ignition points Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1 (presence) / Wilderness area designation Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) / Soil Type designation Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation
Class Labels
Spruce/Fir, Lodgepole Pine, Ponderosa Pine, Cottonwood/Willow, Aspen, Douglas-fir, Krummholz
This dataset is an ATLAS.ti copy bundle that contains the analysis of 86 articles that appeared between March 2011 and March 2013 in the Dutch quality newspaper NRC Handelsblad in the weekly article series 'the last word' [Dutch: 'het laatste woord'] that were written by NRC editor Gijsbert van Es. Newspaper texts have been retrieved from LexisNexis (http://academic.lexisnexis.nl/). These articles describe the experience of the last phase of life of people who were confronted with approaching death due to cancer or other life-threatening diseases, or due to old age and age-related health losses. The analysis focuses on the meanings concerning death and dying that were expressed by these people in their last phase of life. The data-set was analysed with ATLAS.ti and contains a codebook. In the memo manager a memo is included that provides information concerning the analysed data. Culturally embedded meanings concerning death and dying have been interpreted as 'death-related cultural affordances': possibilities for perception and action in the face of death that are offered by the cultural environment. These have been grouped into three different ‘cultural niches’ (sets of mutually supporting cultural affordances) that are grounded in different mechanisms for determining meaning: a canonical niche (grounding meaning in established (religious) authority and tradition), a utilitarian niche (grounding meaning in rationality and utilitarian function) and an expressive niche (grounding meaning in authentic (and often aesthetic) self-expression. Interviews are in Dutch; Codes, analysis and metadata are in English.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
A retrospective case-control study was used in this research, the hospitalization information was abstracted from the electrical medical records of multiparous women who underwent vaginal delivery at the obstetric department from 1st January 2022 to 31th December 2023. The medical records of multiparous women who labored in hospital were collected by using the system of clinical information, the total number of data entried 23, the meaning of row headers contained: (1) Demographic features included the age(years old), height(cm), antepartum weight(kg), BMI of pre-pregnancy(kg/cm2), education of maternal. (2) Obstetrics-related features included the frequency of pregnancy, labor and abortion, degree of gestational weight gain, gestational age(weeks), interpregnancy interval(years), hypertensive disorders of pregnancy and premature rupture of membranes. (3) Use of drugs: Induction of labor, including spontaneous, oxytocin and other pharmacological induction which included transfer cervical balloon, dinoprostone, misoprostol and combined labor induction, the utilization of antispasmodic drugs after regular contractions. (4) Neonatal characteristics included the weight of newborn(kg). (5) Medical history characteristics included history of the precipitate delivery, the prematurity, the macrosomia, the abnormal second stage and the mode of previous labor and the variable of outcome(precipitate delivery). The meaning of column headers was every single patient.SPSS 26.0 was employed for statistical analysis. Quantitative data that conformed to or approximated a normal distribution were described as mean ± standard deviation. The independent samples t-test was used for comparison between two groups. If else, it was described as M (P25, P75) and the Mann-Whitney U test was used for comparison between two groups. Qualitative data were described as cases (%) and group comparison was performed using the c2 or Fisher test. The Boruta package of R4.3.3 was employed to identify pertinent variables and these were integrated with variables exhibiting a p value of less than 0.05 from univariate analysis to serve as the independent variables in the multicollinearity analysis. The VIF was utilized to access the potential multicollinearity, with a VIF exceeding 5 indicating the presence of covariance among the variables. Subsequently, logistic regression analysis was conducted to identify the influencing factors, with a p< 0.05 considered statistically significant.In this dataset, the NCA package of R4.3.3 was initially to employed to analyze the necessity of the factors affecting the occurrence of precipitate delivery in multiparous women. Subsequently, fsQCA4.1 was applicated to explore the key conditional configurations of precipitate delivery in multiparous women by using the crisp-set qualitative comparative analysis(csQCA).
Citation searchContains the search strategy performed in LexisNexis Academic database.Citation_search.txtCitation_dataContains citation search results.R_scriptContains the script used to generate the correspondence analysis plot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study "Transparency of open data ecosystems in smart cities: Definition and assessment of the maturity of transparency in 22 smart cities" (Sustainable Cities and Society (SCS), vol.82, 103906) conducted by Martin Lnenicka (University of Pardubice), Anastasija Nikiforova (University of Tartu), Mariusz Luterek (University of Warsaw), Otmane Azeroual (German Centre for Higher Education Research and Science Studies), Dandison Ukpabi (University of Jyväskylä), Visvaldis Valtenbergs (University of Latvia), Renata Machova (University of Pardubice).
This study inspects smart cities’ data portals and assesses their compliance with transparency requirements for open (government) data by means of the expert assessment of 34 portals representing 22 smart cities, with 36 features.
It being made public both to act as supplementary data for the paper and in order for other researchers to use these data in their own work potentially contributing to the improvement of current data ecosystems and build sustainable, transparent, citizen-centered, and socially resilient open data-driven smart cities.
Purpose of the expert assessment The data in this dataset were collected in the result of the applying the developed benchmarking framework for assessing the compliance of open (government) data portals with the principles of transparency-by-design proposed by Lněnička and Nikiforova (2021)* to 34 portals that can be considered to be part of open data ecosystems in smart cities, thereby carrying out their assessment by experts in 36 features context, which allows to rank them and discuss their maturity levels and (4) based on the results of the assessment, defining the components and unique models that form the open data ecosystem in the smart city context.
Methodology Sample selection: the capitals of the Member States of the European Union and countries of the European Economic Area were selected to ensure a more coherent political and legal framework. They were mapped/cross-referenced with their rank in 5 smart city rankings: IESE Cities in Motion Index, Top 50 smart city governments (SCG), IMD smart city index (SCI), global cities index (GCI), and sustainable cities index (SCI). A purposive sampling method and systematic search for portals was then carried out to identify relevant websites for each city using two complementary techniques: browsing and searching. To evaluate the transparency maturity of data ecosystems in smart cities, we have used the transparency-by-design framework (Lněnička & Nikiforova, 2021)*. The benchmarking supposes the collection of quantitative data, which makes this task an acceptability task. A six-point Likert scale was applied for evaluating the portals. Each sub-dimension was supplied with its description to ensure the common understanding, a drop-down list to select the level at which the respondent (dis)agree, and a comment to be provided, which has not been mandatory. This formed a protocol to be fulfilled on every portal. Each sub-dimension/feature was assessed using a six-point Likert scale, where strong agreement is assessed with 6 points, while strong disagreement is represented by 1 point. Each website (portal) was evaluated by experts, where a person is considered to be an expert if a person works with open (government) data and data portals daily, i.e., it is the key part of their job, which can be public officials, researchers, and independent organizations. In other words, compliance with the expert profile according to the International Certification of Digital Literacy (ICDL) and its derivation proposed in Lněnička et al. (2021)* is expected to be met. When all individual protocols were collected, mean values and standard deviations (SD) were calculated, and if statistical contradictions/inconsistencies were found, reassessment took place to ensure individual consistency and interrater reliability among experts’ answers. *Lnenicka, M., & Nikiforova, A. (2021). Transparency-by-design: What is the role of open data portals?. Telematics and Informatics, 61, 101605 *Lněnička, M., Machova, R., Volejníková, J., Linhartová, V., Knezackova, R., & Hub, M. (2021). Enhancing transparency through open government data: the case of data portals and their features and capabilities. Online Information Review.
Test procedure (1) perform an assessment of each dimension using sub-dimensions, mapping out the achievement of each indicator (2) all sub-dimensions in one dimension are aggregated, and then the average value is calculated based on the number of sub-dimensions – the resulting average stands for a dimension value - eight values per portal (3) the average value from all dimensions are calculated and then mapped to the maturity level – this value of each portal is also used to rank the portals.
Description of the data in this data set Sheet#1 "comparison_overall" provides results by portal Sheet#2 "comparison_category" provides results by portal and category Sheet#3 "category_subcategory" provides list of categories and its elements
Format of the file .xls
Licenses or restrictions CC-BY
For more info, see README.txt
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set from- What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study
Abstract of the study: The treatment of cancer can have a significant impact on quality of life in older patients and this needs to be taken into account in decision making. However, quality of life can consist of many different components with varying importance between individuals. We set out to assess how older patients with cancer define quality of life and the components that are most significant to them. This was a single-centre, qualitative interview study. Patients aged 70 years or older with cancer were asked to answer open-ended questions: What makes life worthwhile? What does quality of life mean to you? What could affect your quality of life? Subsequently, they were asked to choose the five most important determinants of quality of life from a predefined list: cognition, contact with family or with community, independence, staying in your own home, helping others, having enough energy, emotional well-being, life satisfaction, religion and leisure activities. Afterwards, answers to the open-ended questions were independently categorized by two authors. The proportion of patients mentioning each category in the open-ended questions were compared to the predefined questions. Overall, 63 patients (median age 76 years) were included. When asked, “What makes life worthwhile?”, patients identified social functioning (86%) most frequently. Moreover, to define quality of life, patients most frequently mentioned categories in the domains of physical functioning (70%) and physical health (48%). Maintaining cognition was mentioned in 17% of the open-ended questions and it was the most commonly chosen option from the list of determinants (72% of respondents). In conclusion, physical functioning, social functioning, physical health and cognition are important components in quality of life. When discussing treatment options, the impact of treatment on these aspects should be taken into consideration.
Reference of research paper: Seghers PAL, Kregting JA, van Huis-Tanja LH, Soubeyran P, O'Hanlon S, Rostoft S, Hamaker ME, Portielje JEA. What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study. Cancers. 2022; 14(5):1123. https://doi.org/10.3390/cancers14051123
Content of the data set: The first Tab describes what questions were asked, the second tab shows all individual anonymised answers to the open questions, the fourth shows the definitions that were used to classify all answers. Q1-Q4 show how the answers were categorised.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains data collected from patients suffering from cancer-related pain. The features extracted from clinical data (including typical cancer phenomena such as breakthrough pain) and the biosignal acquisitions contributed to the definition of a multidimensional dataset. This unique database can be useful for the characterization of the patient’s pain experience from a qualitative and quantitative perspective. We implemented measurable biosignals-related indicators of the individual’s pain response and of the overall Autonomic Nervous System (ANS) functioning. The most peculiar features extracted from EDA and ECG signals can be adopted to investigate the status and complex functioning of the ANS through the study of sympatho-vagal activations. Specifically, while EDA is mainly related sympathetic activation, the Heart Rate Variability (HRV), which can be derived from ECG recordings, is strictly related to the interplay between sympathetic and parasympathetic functioning.
As far as the EDA signal, two types of analyzes have been performed: (i) the Trough-To-Peak analysis (TTP), or min-max analysis, aimed at measuring the difference between the Skin Conductance (SC) at the peak of a response and its previous minimum within pre-established time-windows; (ii) the Continuous Decomposition Analysis (CDA), aimed at performing a decomposition of SC data into continuous signals of tonic (basic level of conductance) and phasic (short-duration changes in the SC) activity. Before applying the TPP analysis or the CDA, the signal was filtered by means of a fifth-order Butterworth low-pass filter with a cutoff frequency of 1 Hz and downsampled up to 10 Hz to reducing the computational burden of the analysis. The application of TPP and CDA allowed the detection and measurement of SC Responses (SCR) and the following parameters have been calculated for both TPP and CDA methodologies:
Concerning the ECG, the RR series of interbeat intervals (i.e., the time between successive R waves of the QRS complex on the ECG waveform) has been computed to extract time-domain parameters of the HRV. The R peak detection was carried out by adopting the Pan–Tompkins algorithm for QRS detection and R peak identification. The corresponding RR series of interbeat intervals were derived as the difference between successive R peaks.
The ECG-derived RR time series was then filtered by means of a recursive procedure to remove the intervals differing most from the mean of the surrounding RR intervals. Then, both the Time-Domain Analysis (TDA) and Frequency-Domain Analysis (FDA) of the HRV have been carried out to extract the main features characterizing the variability of the heart rhythm. Time-domain parameters are obtained from statistical analysis of the intervals between heart beats and are used to describe how much variability in the heartbeats is present at various time scales.
The parameters computed through the TDA include the following:
Frequency-domain parameters reflect the distribution of spectral power across different frequencies bands and are used to assess specific components of HRV (e.g., thermoregulation control loop, baroreflex control loop, and respiration control loop, which are regulated by both sympathetic and vagal nerves of the ANS).
The parameters computed through the FDA have been computed by adopting the Welch's Fourier periodogram method based on the Discrete Fourier Transform (DFT), which allows the expression of the RR series in the discrete frequency domain. However, due to the non-stationarity of the RR series, Welch Fourier periodogram method is used for dealing with non-stationarity. Specifically, Welch's periodogram divides the signal into specific periods of constant length appliying the Fast Fourier Transform (FFT) trasforming individually these parts of the signal. The periodogram is basically a way of estimating power spectral density of a time series.
The FDA parameters include the following:
In France, farmers commission about 250,000 soil-testing analyses per year to assist them managing soil fertility. The number and diversity of origin of the samples make these analyses an interesting and original information source regarding cultivated topsoil variability. Moreover, these analyses relate to several parameters strongly influenced by human activity (macronutrient contents, pH...), for which existing cartographic information is not very relevant. Compiling the results of these analyses into a database makes it possible to re-use these data within both a national and temporal framework. A database compilation relating to data collected over the period 1990-2009 has been recently achieved. So far, commercial soil-testing laboratories approved by the Ministry of Agriculture have provided analytical results from more than 2,000,000 samples. After the initial quality control stage, analytical results from more than 1,900,000 samples were available in the database. The anonymity of the landholders seeking soil analyses is perfectly preserved, as the only identifying information stored is the location of the nearest administrative city to the sample site. We present in this dataset a set of statistical parameters of the spatial distributions for several agronomic soil properties. These statistical parameters are calculated for 4 different nested spatial entities (administrative areas: e.g. regions, departments, counties and agricultural areas) and for 4 time periods (1990-1994, 1995-1999, 2000-2004, 2005-2009). Two kinds of agronomic soil properties are available: the firs one correspond to the quantitative variables like the organic carbon content and the second one corresponds to the qualitative variables like the texture class. For each spatial unit and temporal period, we calculated the following statistics stets: the first set is calculated for the quantitative variables and corresponds to the number of samples, the mean, the standard deviation and, the 2-,4-,10-quantiles; the second set is calculated for the qualitative variables and corresponds to the number of samples, the value of the dominant class, the number of samples of the dominant class, the second dominant class, the number of samples of the second dominant class.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.