26 datasets found

D
Replication Data for: On the role of ecological validity in language and...
dataverse.no
search.dataone.org
pdf +3
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gil Verbeke; Gil Verbeke (2025). Replication Data for: On the role of ecological validity in language and speech research [Dataset]. http://doi.org/10.18710/R5JLFR
Explore at:
txt(877), text/comma-separated-values(38323), text/comma-separated-values(6010), txt(20911), pdf(359047), pdf(160325), text/comma-separated-values(39673), pdf(45937), text/x-r-notebook(10298)Available download formats
Unique identifier
https://doi.org/10.18710/R5JLFR
Dataset updated
Jan 9, 2025
Dataset provided by
DataverseNO
Authors
Gil Verbeke; Gil Verbeke
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Apr 1, 2024 - Jun 21, 2024
Dataset funded by
Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO)
Description
Dataset abstract This dataset contains the results from 40 language and speech researchers, who completed a survey. In the first part of the survey, respondents were asked to complete a demographic (e.g., age, gender, first language) and professional background questionnaire (e.g., current academic position, research interests). In addition, they were asked several open-ended questions about their familiarity with and understanding of the term ‘ecological validity’ (e.g., which words come to mind when you hear this term, how to measure the ecological validity of a study, how does ecological validity apply to your area of research). In the second part of the survey, respondents were presented with 24 short speech excerpts, representing 12 different stimulus types. They were asked to rate each speech excerpt on its degree of casualness (i.e. spontaneity) and naturalness, and how likely they are to encounter each excerpt in everyday listening situations. Article abstract This paper explores how researchers in the field of language and speech sciences understand and apply the concept of ecological validity. It also assesses the ecological validity of various stimulus materials, ranging from isolated word productions to sentences taken from authentic interviews. Forty researchers participated in a survey, which contained (i) a demographic and professional background questionnaire with open-ended questions about the definition, feasibility and desirability of ecological validity, and (ii) a speech rating task. In the rating task, respondents evaluated 24 speech excerpts, representing 12 types of stimulus materials, on their casualness, naturalness, and likelihood of occurrence in real-life contexts. The results showed that while most researchers acknowledge the importance of ecological validity, defining the necessary and sufficient criteria for evaluating or achieving it remains challenging. Regarding stimulus types, unscripted sentences from interviews and Map Task dialogues were rated as the most casual and natural. In contrast, carefully read sentences and digitally modified stimuli were viewed as the least casual and natural, although individual differences in rating were noticeable. Similarly, ratings for the likelihood of occurrence in everyday listening situations were highest for various types of extemporaneous speech. The survey responses not only enhance our theoretical understanding of ecological validity, but also raise awareness about the implications of methodological choices, such as the selection of tasks and stimulus materials, on the ecological validity of a study.
Z
Data from: Synthetic Smart Card Data for the Analysis of Temporal and...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Bouman (2020). Synthetic Smart Card Data for the Analysis of Temporal and Spatial Patterns [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_776718
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Paul Bouman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a synthetic smart card data set that can be used to test pattern detection methods for the extraction of temporal and spatial data. The data set is tab seperated and based on a stylized travel pattern description for city of Utrecht in The Netherlands and is developed and used in Chapter 6 of the PhD Thesis of Paul Bouman.

This dataset contains the following files:

journeys.tsv : the actual data set of synthetic smart card data

utrecht.xml : the activity pattern definition that was used to randomly generate the synthethic smart card data

validate.ref : a file derived from the activity pattern definition that can be used for validation purposes. It specifies which activity types occur at each location in the smart card data set.
n
GPM GROUND VALIDATION KTYX NEXRAD GCPEX V1
earthdata.nasa.gov
gimi9.com
+6more
Updated Jun 9, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GHRC_DAAC (2015). GPM GROUND VALIDATION KTYX NEXRAD GCPEX V1 [Dataset]. http://doi.org/10.5067/GCPEX/NEXRAD/DATA207
Explore at:
Unique identifier
https://doi.org/10.5067/GCPEX/NEXRAD/DATA207
Dataset updated
Jun 9, 2015
Dataset authored and provided by
GHRC_DAAC
Description
The GPM Ground Validation KTYX NEXRAD GCPEx dataset was collected during January 9, 2012 to March 12, 2012 for the GPM Cold-season Precipitation Experiment (GCPEx). GCPEx addressed shortcomings in GPM snowfall retrieval algorithm by collecting microphysical properties, associated remote sensing observations, and coordinated model simulations of precipitating snow. These data sets were collected toward achieving the overarching goal of GCPEx which is to characterize the ability of multi-frequency active and passive microwave sensors to detect and estimate falling snow. The Next Generation Weather Radar system (NEXRAD) is comprised of 160 Weather Surveillance Radar-1988 Doppler (WSR-88D) sites throughout the United States and select overseas locations. The GPM Ground Validation NEXRAD GCPEx data files are available as level 2 binary files and level 3 compressed binary files.
d
Modeled and Observed Weekly Mean Wave Height for Validation of a Wave...
catalog.data.gov
s.cnmilf.com
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Modeled and Observed Weekly Mean Wave Height for Validation of a Wave Exposure Model of Grand Bay, Mississippi [Dataset]. https://catalog.data.gov/dataset/modeled-and-observed-weekly-mean-wave-height-for-validation-of-a-wave-exposure-model-of-gr
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Coastal marshes are highly dynamic and ecologically important ecosystems that are subject to pervasive and often harmful disturbances, including shoreline erosion. Shoreline erosion can result in an overall loss of coastal marsh, particularly in estuaries with moderate- or high-wave energy. Not only can waves be important physical drivers of shoreline change, they can also influence shore-proximal vertical accretion through sediment delivery. For these reason, estimates of wave energy can provide a quantitative measure of wave effects on marsh shorelines. Since wave energy is difficult to measure at all locations, scientists and managers often rely on hydrodynamic models to estimate wave properties at different locations. The Wave Exposure Model (WEMo) is a simple tool that uses linear wave theory to estimate wave energy characteristics for enclosed and semi-enclosed estuaries(Malhotra and Fonseca, 2007). The interpretation of hydrodynamic models is improved if model results can be validated against measured data. The data presented in this publication are input and validation data for modeled and observed mean wave height for two temporary oceanographic stations established by the U.S. Geological Survey (USGS) in the Grand Bay National Estuarine Research Reserve, Mississippi.
f
Evaluation results for play detection.
plos.figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Evaluation results for play detection. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298107.t003
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Jonas Bischofberger; Arnold Baca; Erich Schikuta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.
Data from "The variable quality of metadata about biological samples used in...
figshare.com
xlsx
Updated Jan 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Gonçalves (2019). Data from "The variable quality of metadata about biological samples used in biomedical experiments" [Dataset]. http://doi.org/10.6084/m9.figshare.6890603.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6890603.v3
Dataset updated
Jan 26, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rafael Gonçalves
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This fileset provides supporting data and corpora for the empirical study described in:Rafael S. Gonçalves and Mark A. Musen. The variable quality of metadata about biological samples used in biomedical experiments. Scientific Data, in press (2019).Description of filesAnalysis spreadsheet files:- ncbi-biosample-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the NCBI BioSample.- ebi-biosamples-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the EBI BioSamples.Validation data files:- ncbi-biosample-validation-data.tar.gz is an archive containing the validation data for the analysis of the entire NCBI BioSample dataset.- ncbi-biosample-packaged-validation-data.tar.gz is an archive containing the validation data for the analysis of the subset of metadata records in the NCBI BioSample that use a BioSample package definition.- ebi-ncbi-shared-records-validation-data.tar.gz is an archive containing the validation data for the analysis of the set of metadata records that exist both in EBI BioSamples and NCBI BioSample.Corpus files:- ebi-biosamples-corpus.xml.gz corresponds to the EBI BioSamples corpus.- ncbi-biosample-corpus.xml.gz corresponds to the NCBI BioSample corpus.- ncbi-biosample-packaged-records-corpus.tar.gz corresponds to the NCBI BioSample metadata records that declare a package definition.- ebi-ncbi-shared-records-corpus.tar.gz corresponds to the corpus of metadata records that exist both in NCBI BioSample and EBI BioSamples.
u
Results and analysis using the Lean Six-Sigma define, measure, analyze,...
researchdata.up.ac.za
docx
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Modiehi Mophethe (2024). Results and analysis using the Lean Six-Sigma define, measure, analyze, improve, and control (DMAIC) Framework [Dataset]. http://doi.org/10.25403/UPresearchdata.25370374.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25403/UPresearchdata.25370374.v1
Dataset updated
Mar 12, 2024
Dataset provided by
University of Pretoria
Authors
Modiehi Mophethe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This section presents a discussion of the research data. The data was received as secondary data however, it was originally collected using the time study techniques. Data validation is a crucial step in the data analysis process to ensure that the data is accurate, complete, and reliable. Descriptive statistics was used to validate the data. The mean, mode, standard deviation, variance and range determined provides a summary of the data distribution and assists in identifying outliers or unusual patterns. The data presented in the dataset show the measures of central tendency which includes the mean, median and the mode. The mean signifies the average value of each of the factors presented in the tables. This is the balance point of the dataset, the typical value and behaviour of the dataset. The median is the middle value of the dataset for each of the factors presented. This is the point where the dataset is divided into two parts, half of the values lie below this value and the other half lie above this value. This is important for skewed distributions. The mode shows the most common value in the dataset. It was used to describe the most typical observation. These values are important as they describe the central value around which the data is distributed. The mean, mode and median give an indication of a skewed distribution as they are not similar nor are they close to one another. In the dataset, the results and discussion of the results is also presented. This section focuses on the customisation of the DMAIC (Define, Measure, Analyse, Improve, Control) framework to address the specific concerns outlined in the problem statement. To gain a comprehensive understanding of the current process, value stream mapping was employed, which is further enhanced by measuring the factors that contribute to inefficiencies. These factors are then analysed and ranked based on their impact, utilising factor analysis. To mitigate the impact of the most influential factor on project inefficiencies, a solution is proposed using the EOQ (Economic Order Quantity) model. The implementation of the 'CiteOps' software facilitates improved scheduling, monitoring, and task delegation in the construction project through digitalisation. Furthermore, project progress and efficiency are monitored remotely and in real time. In summary, the DMAIC framework was tailored to suit the requirements of the specific project, incorporating techniques from inventory management, project management, and statistics to effectively minimise inefficiencies within the construction project.
f
Overview of training and test sets.
plos.figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Overview of training and test sets. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298107.t002
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Jonas Bischofberger; Arnold Baca; Erich Schikuta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.
f
Demographic data of Dataset 1 (test–retest variability dataset for simulated...
plos.figshare.com
xls
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue (2023). Demographic data of Dataset 1 (test–retest variability dataset for simulated VF series). [Dataset]. http://doi.org/10.1371/journal.pone.0291208.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291208.t002
Dataset updated
Sep 8, 2023
Dataset provided by
PLOS ONE
Authors
Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Demographic data of Dataset 1 (test–retest variability dataset for simulated VF series).
f
Demographic data of Dataset 2 (real VF series from clinics).
plos.figshare.com
xls
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue (2023). Demographic data of Dataset 2 (real VF series from clinics). [Dataset]. http://doi.org/10.1371/journal.pone.0291208.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291208.t003
Dataset updated
Sep 8, 2023
Dataset provided by
PLOS ONE
Authors
Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Demographic data of Dataset 2 (real VF series from clinics).
S
A High-resolution Air Quality Reanalysis Dataset over China (CAQRA)
scidb.cn
Updated Apr 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiao Tang; Lei Kong; Jiang Zhu; Zifa Wang; Jianjun Li; Huangjian Wu; Qizhong Wu; Huansheng Chen; Lili Zhu; Wei Wang; Bing Liu; Qian Wang; Duohong Chen; Yuepeng Pan; Tao Song; Fei Li; Haitao Zheng; Guanglin Jia; Miaomiao Lu; Lin Wu; Gregory R. Carmichael (2020). A High-resolution Air Quality Reanalysis Dataset over China (CAQRA) [Dataset]. http://doi.org/10.11922/sciencedb.00053
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.00053
Dataset updated
Apr 21, 2020
Dataset provided by
Science Data Bank
Authors
Xiao Tang; Lei Kong; Jiang Zhu; Zifa Wang; Jianjun Li; Huangjian Wu; Qizhong Wu; Huansheng Chen; Lili Zhu; Wei Wang; Bing Liu; Qian Wang; Duohong Chen; Yuepeng Pan; Tao Song; Fei Li; Haitao Zheng; Guanglin Jia; Miaomiao Lu; Lin Wu; Gregory R. Carmichael
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
Main content and developer: The Chinese Air Quality Reanalysis dataset was produced by the Institute of Atmospheric Physics, Chinese Academy of Sciences (IAP/CAS), collaborated with China National Environmental Monitoring Centre (CNEMC) and other research institutes. It provides the surface gridded fields of six conventional air pollutants (i.e. PM2.5，PM10，SO2，NO2，CO and O3) and the simulated surface fields of wind speed (u, v), pressure (psfc), relative humidity (RH) and temperature (temp) by WRF model. The spatial and temporal resolutions are respectively 15km and 1 hour. Currently, the time period of the dataset is from 2013 to 2019, and will be updated irregularly.Data assimilation method: The dataset was produced by the chemical data assimilation system (ChemDAS) developed by IAP, CAS, which assimilates over 1000 surface air quality monitoring sites from CNEMC based on the ensemble Kalman filter (EnKF) and the Nested Air Quality Prediction Modeling system (NAQPMS). This method broke through the problems of instability, insufficient adjustment and negative assimilation effect in atmospheric chemistry data assimilation and develops the multi-air pollutant collaborative assimilation including the monitoring data automatic quality control methods, adaptive mode error estimation and other advanced algorithms. It has been published in Earth System Science Data, where detailed descriptions and validation of this dataset are available (https://doi.org/10.5194/essd-13-529-2021).Data accuracy: The dataset was evaluated by cross-validation and independent data validation. 2013-2018: The calculated root of mean square error (RMSE) at assimilation (validation) sites for hourly concentrations were estimated to be 15.2 (21.3) μg/m3 for PM2.5, 28.0 (39.3) μg/m3 for PM10, 16.9 (24.9) μg/m3 for SO2, 12.7 (16.4) μg/m3 for NO2, 0.38 (0.54) mg/m3 for CO and 17.5 (21.9) μg/m3 for O3. 2019: The calculated root of mean square error (RMSE) at assimilation (validation) sites for hourly concentrations were estimated to be 10.2 (13.3) μg/m3 for PM2.5, 19.1 (24.5) μg/m3 for PM10, 6.1 (7.7) μg/m3 for SO2, 10.0(12.4) μg/m3 for NO2, 0.24 (0.30) mg/m3 for CO and 14.0 (17.2) μg/m3 for O3.Dataset’s versions : The first version of this datasets (V1) is from 2013 to 2018 including 72 zip files and each zip file contains one month of reanalysis data. The second version of this datasets (V2) is from 2013 to 2018, splitting by days and in all 2191 zip files. The third version of this datasets (V3) was extended to 2019 based on the same algorithm and validation as the V1 and V2, including seven folders in a year. Each folder contains the reanalysis data compression files in days. The description on the content of each data file is available in README.txt.
n
Data from: GPM GROUND VALIDATION NOAA S-BAND PROFILER MINUTE DATA MC3E
cmr.earthdata.nasa.gov
s.cnmilf.com
+5more
Updated Aug 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). GPM GROUND VALIDATION NOAA S-BAND PROFILER MINUTE DATA MC3E [Dataset]. http://doi.org/10.5067/GPMGV/MC3E/SBAND/DATA201
Explore at:
Unique identifier
https://doi.org/10.5067/GPMGV/MC3E/SBAND/DATA201
Dataset updated
Aug 8, 2024
Time period covered
Apr 16, 2011 - Jun 7, 2011
Area covered

Description
The GPM Ground Validation NOAA S-Band Profiler Minute Data MC3E dataset was gathered during the Midlatitude Continental Convective Clouds Experiment (MC3E) in Oklahoma from April-June 2011. The overarching goal was to provide the most complete characterization of convective cloud systems, precipitation, and the environment that has ever been obtained, providing constraints for model cumulus parameterizations and space-based rainfall retrieval algorithms over land that had never before been available. The S-band 2.8 GHz profiler measured the backscattered power from raindrops and ice particles as precipitating cloud systems pass overhead. After calibration, the instrument provided an unattenuated reflectivity estimate through the precipitation. Spectra and moment files are included in netCDF format.
f
Definitions of causes of death.
plos.figshare.com
xls
Updated Jun 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fumiya Ito; Shintaro Togashi; Yuri Sato; Kento Masukawa; Kazuki Sato; Masaharu Nakayama; Kenji Fujimori; Mitsunori Miyashita (2023). Definitions of causes of death. [Dataset]. http://doi.org/10.1371/journal.pone.0283209.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0283209.t001
Dataset updated
Jun 7, 2023
Dataset provided by
PLOS ONE
Authors
Fumiya Ito; Shintaro Togashi; Yuri Sato; Kento Masukawa; Kazuki Sato; Masaharu Nakayama; Kenji Fujimori; Mitsunori Miyashita
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identifying the cause of death is important for the study of end-of-life patients using claims data in Japan. However, the validity of how cause of death is identified using claims data remains unknown. Therefore, this study aimed to verify the validity of the method used to identify the cause of death based on Japanese claims data. Our study population included patients who died at two institutions between January 1, 2018 and December 31, 2019. Claims data consisted of medical data and Diagnosis Procedure Combination (DPC) data, and five definitions developed from disease classification in each dataset were compared with death certificates. Nine causes of death, including cancer, were included in the study. The definition with the highest positive predictive values (PPVs) and sensitivities in this study was the combination of “main disease” in both medical and DPC data. For cancer, these definitions had PPVs and sensitivities of > 90%. For heart disease, these definitions had PPVs of > 50% and sensitivities of > 70%. For cerebrovascular disease, these definitions had PPVs of > 80% and sensitivities of> 70%. For other causes of death, PPVs and sensitivities were < 50% for most definitions. Based on these results, we recommend definitions with a combination of “main disease” in both medical and DPC data for cancer and cerebrovascular disease. However, a clear argument cannot be made for other causes of death because of the small sample size. Therefore, the results of this study can be used with confidence for cancer and cerebrovascular disease but should be used with caution for other causes of death.
For research tasks 3 and 4: Mean, median, and standard deviation (over 50...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theresa Ullmann; Stefanie Peschel; Philipp Finger; Christian L. Müller; Anne-Laure Boulesteix (2023). For research tasks 3 and 4: Mean, median, and standard deviation (over 50 samplings of discovery/validation data) of the difference (both unscaled and scaled) between the value of the evaluation criterion on the validation data and the corresponding value on the discovery data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010820.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1010820.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Theresa Ullmann; Stefanie Peschel; Philipp Finger; Christian L. Müller; Anne-Laure Boulesteix
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additionally, the effect size (mean divided by standard deviation) is reported. GCDdiscov denotes the largest GCD on the discovery data and GCDvalid the GCD resulting from the corresponding method combination on the validation data. The quantities ASWdiscov, ASWvalid (average silhouette width) are defined analogously.
f
Data from: Combinations of parameters.
plos.figshare.com
xls
Updated Sep 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue (2023). Combinations of parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0291208.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291208.t001
Dataset updated
Sep 8, 2023
Dataset provided by
PLOS ONE
Authors
Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PurposeTo investigate the clinical validity of the Guided Progression Analysis definition (GPAD) and cluster-based definition (CBD) with the Humphrey Field Analyzer (HFA) 10–2 test in retinitis pigmentosa (RP).MethodsTen non-progressive RP visual fields (VFs) (HFA 10–2 test) were simulated for each of 10 VFs of 111 eyes (10 simulations × 10 VF sequencies × 111 eyes = 111,000 VFs; Dataset 1). Using these simulated VFs, the specificity of GPAD for the detection of progression was determined. Using this dataset, similar analyses were conducted for the CBD, in which the HFA 10–2 test was divided into four quadrants. Subsequently, the Hybrid Definition was designed by combining the GPAD and CBD; various conditions of the GPAD and CBD were altered to approach a specificity of 95.0%. Subsequently, actual HFA 10–2 tests of 116 RP eyes (10 VFs each) were collected (Dataset 2), and true positive rate, true negative rate, false positive rate, and the time required to detect VF progression were evaluated and compared across the GPAD, CBD, and Hybrid Definition.ResultsSpecificity values were 95.4% and 98.5% for GPAD and CBD, respectively. There were no significant differences in true positive rate, true negative rate, and false positive rate between the GPAD, CBD, and Hybrid Definition. The GPAD and Hybrid Definition detected progression significantly earlier than the CBD (at 4.5, 5.0, and 4.5 years, respectively).ConclusionsThe GPAD and the optimized Hybrid Definition exhibited similar ability for the detection of progression, with the specificity reaching 95.4%.
f
Features used for the pass/shot classification.
plos.figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Features used for the pass/shot classification. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298107.t001
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Jonas Bischofberger; Arnold Baca; Erich Schikuta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.
f
SPOG 2015 FN Definition study (NCT02324231) - FN episodes
figshare.com
xlsx
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christa Koenig; Roland A. Ammann; Marina Santschi (2023). SPOG 2015 FN Definition study (NCT02324231) - FN episodes [Dataset]. http://doi.org/10.6084/m9.figshare.22337248.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22337248.v1
Dataset updated
Jun 7, 2023
Dataset provided by
figshare
Authors
Christa Koenig; Roland A. Ammann; Marina Santschi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SPOG 2015 FN Definition study (NCT02324231) recruited patients from April 2016 to August 2018 in six out of nine pediatric oncology centers in Switzerland. 269 patients were observed and 360 episodes of fever in neutopenia (FN) were diagnosed in 158 patients. Here data on the 360 FN episodes is published. Data are fully anonymized. In order not to compromise anonymization, information on date and times are not given. A key-file explains all variables. A data-file contains the data of the 360 FN episodes.
n
GPM GROUND VALIDATION KICT NEXRAD MC3E V1
cmr.earthdata.nasa.gov
s.cnmilf.com
+6more
Updated Aug 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). GPM GROUND VALIDATION KICT NEXRAD MC3E V1 [Dataset]. http://doi.org/10.5067/MC3E/NEXRAD/DATA201
Explore at:
Unique identifier
https://doi.org/10.5067/MC3E/NEXRAD/DATA201
Dataset updated
Aug 8, 2024
Time period covered
Apr 22, 2011 - Jun 6, 2011
Area covered

Description
The GPM Ground Validaiton KICT NEXRAD MC3E dataset was collected from April 22, 2011 to June 6, 2011 for the Midlatitude Continental Convective Clouds Experiment (MC3E) which took place in central Oklahoma. The overarching goal of MC3E was to provide the most complete characterization of convective cloud systems, precipitation, and the environment that has ever been obtained, providing constraints for model cumulus parameterizations and space-based rainfall retrieval algorithms over land that had never before been available. The Next Generation Weather Radar system (NEXRAD) is comprised of 160 Weather Surveillance Radar-1988 Doppler (WSR-88D) sites throughout the United States and select overseas locations. The GPM Ground Validation NEXRAD MC3E data files are available as compressed binary files.
Item statistics including mean score, standard deviation, factor loadings,...
plos.figshare.com
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manfred E. Beutel; Elmar Brähler; Jörg Wiltink; Matthias Michal; Eva M. Klein; Claus Jünger; Philipp S. Wild; Thomas Münzel; Maria Blettner; Karl Lackner; Stefan Nickels; Ana N. Tibubos (2023). Item statistics including mean score, standard deviation, factor loadings, and corrected item-total-correlation. [Dataset]. http://doi.org/10.1371/journal.pone.0186516.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0186516.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Manfred E. Beutel; Elmar Brähler; Jörg Wiltink; Matthias Michal; Eva M. Klein; Claus Jünger; Philipp S. Wild; Thomas Münzel; Maria Blettner; Karl Lackner; Stefan Nickels; Ana N. Tibubos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Item statistics including mean score, standard deviation, factor loadings, and corrected item-total-correlation.
f
The averaged values of the validity indices for all clustering methods...
plos.figshare.com
xls
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polina Bombina; Dwayne Tally; Zachary B. Abrams; Kevin R. Coombes (2024). The averaged values of the validity indices for all clustering methods across all simulated data experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0300358.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0300358.t003
Dataset updated
Jun 7, 2024
Dataset provided by
PLOS ONE
Authors
Polina Bombina; Dwayne Tally; Zachary B. Abrams; Kevin R. Coombes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The averaged values of the validity indices for all clustering methods across all simulated data experiments.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gil Verbeke; Gil Verbeke (2025). Replication Data for: On the role of ecological validity in language and speech research [Dataset]. http://doi.org/10.18710/R5JLFR

Replication Data for: On the role of ecological validity in language and speech research

Explore at:

txt(877), text/comma-separated-values(38323), text/comma-separated-values(6010), txt(20911), pdf(359047), pdf(160325), text/comma-separated-values(39673), pdf(45937), text/x-r-notebook(10298)Available download formats

Unique identifier

https://doi.org/10.18710/R5JLFR

Dataset updated

Jan 9, 2025

Dataset provided by

DataverseNO

Authors

Gil Verbeke; Gil Verbeke

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered

Apr 1, 2024 - Jun 21, 2024

Dataset funded by

Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO)

Description

Dataset abstract This dataset contains the results from 40 language and speech researchers, who completed a survey. In the first part of the survey, respondents were asked to complete a demographic (e.g., age, gender, first language) and professional background questionnaire (e.g., current academic position, research interests). In addition, they were asked several open-ended questions about their familiarity with and understanding of the term ‘ecological validity’ (e.g., which words come to mind when you hear this term, how to measure the ecological validity of a study, how does ecological validity apply to your area of research). In the second part of the survey, respondents were presented with 24 short speech excerpts, representing 12 different stimulus types. They were asked to rate each speech excerpt on its degree of casualness (i.e. spontaneity) and naturalness, and how likely they are to encounter each excerpt in everyday listening situations. Article abstract This paper explores how researchers in the field of language and speech sciences understand and apply the concept of ecological validity. It also assesses the ecological validity of various stimulus materials, ranging from isolated word productions to sentences taken from authentic interviews. Forty researchers participated in a survey, which contained (i) a demographic and professional background questionnaire with open-ended questions about the definition, feasibility and desirability of ecological validity, and (ii) a speech rating task. In the rating task, respondents evaluated 24 speech excerpts, representing 12 types of stimulus materials, on their casualness, naturalness, and likelihood of occurrence in real-life contexts. The results showed that while most researchers acknowledge the importance of ecological validity, defining the necessary and sufficient criteria for evaluating or achieving it remains challenging. Regarding stimulus types, unscripted sentences from interviews and Map Task dialogues were rated as the most casual and natural. In contrast, carefully read sentences and digitally modified stimuli were viewed as the least casual and natural, although individual differences in rating were noticeable. Similarly, ratings for the likelihood of occurrence in everyday listening situations were highest for various types of extemporaneous speech. The survey responses not only enhance our theoretical understanding of ecological validity, but also raise awareness about the implications of methodological choices, such as the selection of tasks and stimulus materials, on the ecological validity of a study.

Clear search

Close search

Google apps

Main menu

Replication Data for: On the role of ecological validity in language and...

Data from: Synthetic Smart Card Data for the Analysis of Temporal and...

GPM GROUND VALIDATION KTYX NEXRAD GCPEX V1

Modeled and Observed Weekly Mean Wave Height for Validation of a Wave...

Evaluation results for play detection.

Data from "The variable quality of metadata about biological samples used in...

Results and analysis using the Lean Six-Sigma define, measure, analyze,...

Overview of training and test sets.

Demographic data of Dataset 1 (test–retest variability dataset for simulated...

Demographic data of Dataset 2 (real VF series from clinics).

A High-resolution Air Quality Reanalysis Dataset over China (CAQRA)

Data from: GPM GROUND VALIDATION NOAA S-BAND PROFILER MINUTE DATA MC3E

Definitions of causes of death.

For research tasks 3 and 4: Mean, median, and standard deviation (over 50...

Data from: Combinations of parameters.

Features used for the pass/shot classification.

SPOG 2015 FN Definition study (NCT02324231) - FN episodes

GPM GROUND VALIDATION KICT NEXRAD MC3E V1

Item statistics including mean score, standard deviation, factor loadings,...

The averaged values of the validity indices for all clustering methods...

Replication Data for: On the role of ecological validity in language and speech research