26 datasets found
  1. D

    Replication Data for: On the role of ecological validity in language and...

    • dataverse.no
    • search.dataone.org
    pdf +3
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gil Verbeke; Gil Verbeke (2025). Replication Data for: On the role of ecological validity in language and speech research [Dataset]. http://doi.org/10.18710/R5JLFR
    Explore at:
    txt(877), text/comma-separated-values(38323), text/comma-separated-values(6010), txt(20911), pdf(359047), pdf(160325), text/comma-separated-values(39673), pdf(45937), text/x-r-notebook(10298)Available download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    DataverseNO
    Authors
    Gil Verbeke; Gil Verbeke
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Apr 1, 2024 - Jun 21, 2024
    Dataset funded by
    Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO)
    Description

    Dataset abstract This dataset contains the results from 40 language and speech researchers, who completed a survey. In the first part of the survey, respondents were asked to complete a demographic (e.g., age, gender, first language) and professional background questionnaire (e.g., current academic position, research interests). In addition, they were asked several open-ended questions about their familiarity with and understanding of the term ‘ecological validity’ (e.g., which words come to mind when you hear this term, how to measure the ecological validity of a study, how does ecological validity apply to your area of research). In the second part of the survey, respondents were presented with 24 short speech excerpts, representing 12 different stimulus types. They were asked to rate each speech excerpt on its degree of casualness (i.e. spontaneity) and naturalness, and how likely they are to encounter each excerpt in everyday listening situations. Article abstract This paper explores how researchers in the field of language and speech sciences understand and apply the concept of ecological validity. It also assesses the ecological validity of various stimulus materials, ranging from isolated word productions to sentences taken from authentic interviews. Forty researchers participated in a survey, which contained (i) a demographic and professional background questionnaire with open-ended questions about the definition, feasibility and desirability of ecological validity, and (ii) a speech rating task. In the rating task, respondents evaluated 24 speech excerpts, representing 12 types of stimulus materials, on their casualness, naturalness, and likelihood of occurrence in real-life contexts. The results showed that while most researchers acknowledge the importance of ecological validity, defining the necessary and sufficient criteria for evaluating or achieving it remains challenging. Regarding stimulus types, unscripted sentences from interviews and Map Task dialogues were rated as the most casual and natural. In contrast, carefully read sentences and digitally modified stimuli were viewed as the least casual and natural, although individual differences in rating were noticeable. Similarly, ratings for the likelihood of occurrence in everyday listening situations were highest for various types of extemporaneous speech. The survey responses not only enhance our theoretical understanding of ecological validity, but also raise awareness about the implications of methodological choices, such as the selection of tasks and stimulus materials, on the ecological validity of a study.

  2. Z

    Data from: Synthetic Smart Card Data for the Analysis of Temporal and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Bouman (2020). Synthetic Smart Card Data for the Analysis of Temporal and Spatial Patterns [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_776718
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Paul Bouman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a synthetic smart card data set that can be used to test pattern detection methods for the extraction of temporal and spatial data. The data set is tab seperated and based on a stylized travel pattern description for city of Utrecht in The Netherlands and is developed and used in Chapter 6 of the PhD Thesis of Paul Bouman.

    This dataset contains the following files:

    journeys.tsv : the actual data set of synthetic smart card data

    utrecht.xml : the activity pattern definition that was used to randomly generate the synthethic smart card data

    validate.ref : a file derived from the activity pattern definition that can be used for validation purposes. It specifies which activity types occur at each location in the smart card data set.

  3. n

    GPM GROUND VALIDATION KTYX NEXRAD GCPEX V1

    • earthdata.nasa.gov
    • gimi9.com
    • +6more
    Updated Jun 9, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GHRC_DAAC (2015). GPM GROUND VALIDATION KTYX NEXRAD GCPEX V1 [Dataset]. http://doi.org/10.5067/GCPEX/NEXRAD/DATA207
    Explore at:
    Dataset updated
    Jun 9, 2015
    Dataset authored and provided by
    GHRC_DAAC
    Description

    The GPM Ground Validation KTYX NEXRAD GCPEx dataset was collected during January 9, 2012 to March 12, 2012 for the GPM Cold-season Precipitation Experiment (GCPEx). GCPEx addressed shortcomings in GPM snowfall retrieval algorithm by collecting microphysical properties, associated remote sensing observations, and coordinated model simulations of precipitating snow. These data sets were collected toward achieving the overarching goal of GCPEx which is to characterize the ability of multi-frequency active and passive microwave sensors to detect and estimate falling snow. The Next Generation Weather Radar system (NEXRAD) is comprised of 160 Weather Surveillance Radar-1988 Doppler (WSR-88D) sites throughout the United States and select overseas locations. The GPM Ground Validation NEXRAD GCPEx data files are available as level 2 binary files and level 3 compressed binary files.

  4. d

    Modeled and Observed Weekly Mean Wave Height for Validation of a Wave...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Modeled and Observed Weekly Mean Wave Height for Validation of a Wave Exposure Model of Grand Bay, Mississippi [Dataset]. https://catalog.data.gov/dataset/modeled-and-observed-weekly-mean-wave-height-for-validation-of-a-wave-exposure-model-of-gr
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Coastal marshes are highly dynamic and ecologically important ecosystems that are subject to pervasive and often harmful disturbances, including shoreline erosion. Shoreline erosion can result in an overall loss of coastal marsh, particularly in estuaries with moderate- or high-wave energy. Not only can waves be important physical drivers of shoreline change, they can also influence shore-proximal vertical accretion through sediment delivery. For these reason, estimates of wave energy can provide a quantitative measure of wave effects on marsh shorelines. Since wave energy is difficult to measure at all locations, scientists and managers often rely on hydrodynamic models to estimate wave properties at different locations. The Wave Exposure Model (WEMo) is a simple tool that uses linear wave theory to estimate wave energy characteristics for enclosed and semi-enclosed estuaries(Malhotra and Fonseca, 2007). The interpretation of hydrodynamic models is improved if model results can be validated against measured data. The data presented in this publication are input and validation data for modeled and observed mean wave height for two temporary oceanographic stations established by the U.S. Geological Survey (USGS) in the Grand Bay National Estuarine Research Reserve, Mississippi.

  5. f

    Evaluation results for play detection.

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Evaluation results for play detection. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jonas Bischofberger; Arnold Baca; Erich Schikuta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.

  6. Data from "The variable quality of metadata about biological samples used in...

    • figshare.com
    xlsx
    Updated Jan 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Gonçalves (2019). Data from "The variable quality of metadata about biological samples used in biomedical experiments" [Dataset]. http://doi.org/10.6084/m9.figshare.6890603.v3
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 26, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rafael Gonçalves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset provides supporting data and corpora for the empirical study described in:Rafael S. Gonçalves and Mark A. Musen. The variable quality of metadata about biological samples used in biomedical experiments. Scientific Data, in press (2019).Description of filesAnalysis spreadsheet files:- ncbi-biosample-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the NCBI BioSample.- ebi-biosamples-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the EBI BioSamples.Validation data files:- ncbi-biosample-validation-data.tar.gz is an archive containing the validation data for the analysis of the entire NCBI BioSample dataset.- ncbi-biosample-packaged-validation-data.tar.gz is an archive containing the validation data for the analysis of the subset of metadata records in the NCBI BioSample that use a BioSample package definition.- ebi-ncbi-shared-records-validation-data.tar.gz is an archive containing the validation data for the analysis of the set of metadata records that exist both in EBI BioSamples and NCBI BioSample.Corpus files:- ebi-biosamples-corpus.xml.gz corresponds to the EBI BioSamples corpus.- ncbi-biosample-corpus.xml.gz corresponds to the NCBI BioSample corpus.- ncbi-biosample-packaged-records-corpus.tar.gz corresponds to the NCBI BioSample metadata records that declare a package definition.- ebi-ncbi-shared-records-corpus.tar.gz corresponds to the corpus of metadata records that exist both in NCBI BioSample and EBI BioSamples.

  7. u

    Results and analysis using the Lean Six-Sigma define, measure, analyze,...

    • researchdata.up.ac.za
    docx
    Updated Mar 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Modiehi Mophethe (2024). Results and analysis using the Lean Six-Sigma define, measure, analyze, improve, and control (DMAIC) Framework [Dataset]. http://doi.org/10.25403/UPresearchdata.25370374.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 12, 2024
    Dataset provided by
    University of Pretoria
    Authors
    Modiehi Mophethe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This section presents a discussion of the research data. The data was received as secondary data however, it was originally collected using the time study techniques. Data validation is a crucial step in the data analysis process to ensure that the data is accurate, complete, and reliable. Descriptive statistics was used to validate the data. The mean, mode, standard deviation, variance and range determined provides a summary of the data distribution and assists in identifying outliers or unusual patterns. The data presented in the dataset show the measures of central tendency which includes the mean, median and the mode. The mean signifies the average value of each of the factors presented in the tables. This is the balance point of the dataset, the typical value and behaviour of the dataset. The median is the middle value of the dataset for each of the factors presented. This is the point where the dataset is divided into two parts, half of the values lie below this value and the other half lie above this value. This is important for skewed distributions. The mode shows the most common value in the dataset. It was used to describe the most typical observation. These values are important as they describe the central value around which the data is distributed. The mean, mode and median give an indication of a skewed distribution as they are not similar nor are they close to one another. In the dataset, the results and discussion of the results is also presented. This section focuses on the customisation of the DMAIC (Define, Measure, Analyse, Improve, Control) framework to address the specific concerns outlined in the problem statement. To gain a comprehensive understanding of the current process, value stream mapping was employed, which is further enhanced by measuring the factors that contribute to inefficiencies. These factors are then analysed and ranked based on their impact, utilising factor analysis. To mitigate the impact of the most influential factor on project inefficiencies, a solution is proposed using the EOQ (Economic Order Quantity) model. The implementation of the 'CiteOps' software facilitates improved scheduling, monitoring, and task delegation in the construction project through digitalisation. Furthermore, project progress and efficiency are monitored remotely and in real time. In summary, the DMAIC framework was tailored to suit the requirements of the specific project, incorporating techniques from inventory management, project management, and statistics to effectively minimise inefficiencies within the construction project.

  8. f

    Overview of training and test sets.

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Overview of training and test sets. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jonas Bischofberger; Arnold Baca; Erich Schikuta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.

  9. f

    Demographic data of Dataset 1 (test–retest variability dataset for simulated...

    • plos.figshare.com
    xls
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue (2023). Demographic data of Dataset 1 (test–retest variability dataset for simulated VF series). [Dataset]. http://doi.org/10.1371/journal.pone.0291208.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Demographic data of Dataset 1 (test–retest variability dataset for simulated VF series).

  10. f

    Demographic data of Dataset 2 (real VF series from clinics).

    • plos.figshare.com
    xls
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue (2023). Demographic data of Dataset 2 (real VF series from clinics). [Dataset]. http://doi.org/10.1371/journal.pone.0291208.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Demographic data of Dataset 2 (real VF series from clinics).

  11. S

    A High-resolution Air Quality Reanalysis Dataset over China (CAQRA)

    • scidb.cn
    Updated Apr 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiao Tang; Lei Kong; Jiang Zhu; Zifa Wang; Jianjun Li; Huangjian Wu; Qizhong Wu; Huansheng Chen; Lili Zhu; Wei Wang; Bing Liu; Qian Wang; Duohong Chen; Yuepeng Pan; Tao Song; Fei Li; Haitao Zheng; Guanglin Jia; Miaomiao Lu; Lin Wu; Gregory R. Carmichael (2020). A High-resolution Air Quality Reanalysis Dataset over China (CAQRA) [Dataset]. http://doi.org/10.11922/sciencedb.00053
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2020
    Dataset provided by
    Science Data Bank
    Authors
    Xiao Tang; Lei Kong; Jiang Zhu; Zifa Wang; Jianjun Li; Huangjian Wu; Qizhong Wu; Huansheng Chen; Lili Zhu; Wei Wang; Bing Liu; Qian Wang; Duohong Chen; Yuepeng Pan; Tao Song; Fei Li; Haitao Zheng; Guanglin Jia; Miaomiao Lu; Lin Wu; Gregory R. Carmichael
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Main content and developer: The Chinese Air Quality Reanalysis dataset was produced by the Institute of Atmospheric Physics, Chinese Academy of Sciences (IAP/CAS), collaborated with China National Environmental Monitoring Centre (CNEMC) and other research institutes. It provides the surface gridded fields of six conventional air pollutants (i.e. PM2.5,PM10,SO2,NO2,CO and O3) and the simulated surface fields of wind speed (u, v), pressure (psfc), relative humidity (RH) and temperature (temp) by WRF model. The spatial and temporal resolutions are respectively 15km and 1 hour. Currently, the time period of the dataset is from 2013 to 2019, and will be updated irregularly.Data assimilation method: The dataset was produced by the chemical data assimilation system (ChemDAS) developed by IAP, CAS, which assimilates over 1000 surface air quality monitoring sites from CNEMC based on the ensemble Kalman filter (EnKF) and the Nested Air Quality Prediction Modeling system (NAQPMS). This method broke through the problems of instability, insufficient adjustment and negative assimilation effect in atmospheric chemistry data assimilation and develops the multi-air pollutant collaborative assimilation including the monitoring data automatic quality control methods, adaptive mode error estimation and other advanced algorithms. It has been published in Earth System Science Data, where detailed descriptions and validation of this dataset are available (https://doi.org/10.5194/essd-13-529-2021).Data accuracy: The dataset was evaluated by cross-validation and independent data validation. 2013-2018: The calculated root of mean square error (RMSE) at assimilation (validation) sites for hourly concentrations were estimated to be 15.2 (21.3) μg/m3 for PM2.5, 28.0 (39.3) μg/m3 for PM10, 16.9 (24.9) μg/m3 for SO2, 12.7 (16.4) μg/m3 for NO2, 0.38 (0.54) mg/m3 for CO and 17.5 (21.9) μg/m3 for O3. 2019: The calculated root of mean square error (RMSE) at assimilation (validation) sites for hourly concentrations were estimated to be 10.2 (13.3) μg/m3 for PM2.5, 19.1 (24.5) μg/m3 for PM10, 6.1 (7.7) μg/m3 for SO2, 10.0(12.4) μg/m3 for NO2, 0.24 (0.30) mg/m3 for CO and 14.0 (17.2) μg/m3 for O3.Dataset’s versions : The first version of this datasets (V1) is from 2013 to 2018 including 72 zip files and each zip file contains one month of reanalysis data. The second version of this datasets (V2) is from 2013 to 2018, splitting by days and in all 2191 zip files. The third version of this datasets (V3) was extended to 2019 based on the same algorithm and validation as the V1 and V2, including seven folders in a year. Each folder contains the reanalysis data compression files in days. The description on the content of each data file is available in README.txt.

  12. n

    Data from: GPM GROUND VALIDATION NOAA S-BAND PROFILER MINUTE DATA MC3E

    • cmr.earthdata.nasa.gov
    • s.cnmilf.com
    • +5more
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). GPM GROUND VALIDATION NOAA S-BAND PROFILER MINUTE DATA MC3E [Dataset]. http://doi.org/10.5067/GPMGV/MC3E/SBAND/DATA201
    Explore at:
    Dataset updated
    Aug 8, 2024
    Time period covered
    Apr 16, 2011 - Jun 7, 2011
    Area covered
    Description

    The GPM Ground Validation NOAA S-Band Profiler Minute Data MC3E dataset was gathered during the Midlatitude Continental Convective Clouds Experiment (MC3E) in Oklahoma from April-June 2011. The overarching goal was to provide the most complete characterization of convective cloud systems, precipitation, and the environment that has ever been obtained, providing constraints for model cumulus parameterizations and space-based rainfall retrieval algorithms over land that had never before been available. The S-band 2.8 GHz profiler measured the backscattered power from raindrops and ice particles as precipitating cloud systems pass overhead. After calibration, the instrument provided an unattenuated reflectivity estimate through the precipitation. Spectra and moment files are included in netCDF format.

  13. f

    Definitions of causes of death.

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fumiya Ito; Shintaro Togashi; Yuri Sato; Kento Masukawa; Kazuki Sato; Masaharu Nakayama; Kenji Fujimori; Mitsunori Miyashita (2023). Definitions of causes of death. [Dataset]. http://doi.org/10.1371/journal.pone.0283209.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Fumiya Ito; Shintaro Togashi; Yuri Sato; Kento Masukawa; Kazuki Sato; Masaharu Nakayama; Kenji Fujimori; Mitsunori Miyashita
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identifying the cause of death is important for the study of end-of-life patients using claims data in Japan. However, the validity of how cause of death is identified using claims data remains unknown. Therefore, this study aimed to verify the validity of the method used to identify the cause of death based on Japanese claims data. Our study population included patients who died at two institutions between January 1, 2018 and December 31, 2019. Claims data consisted of medical data and Diagnosis Procedure Combination (DPC) data, and five definitions developed from disease classification in each dataset were compared with death certificates. Nine causes of death, including cancer, were included in the study. The definition with the highest positive predictive values (PPVs) and sensitivities in this study was the combination of “main disease” in both medical and DPC data. For cancer, these definitions had PPVs and sensitivities of > 90%. For heart disease, these definitions had PPVs of > 50% and sensitivities of > 70%. For cerebrovascular disease, these definitions had PPVs of > 80% and sensitivities of> 70%. For other causes of death, PPVs and sensitivities were < 50% for most definitions. Based on these results, we recommend definitions with a combination of “main disease” in both medical and DPC data for cancer and cerebrovascular disease. However, a clear argument cannot be made for other causes of death because of the small sample size. Therefore, the results of this study can be used with confidence for cancer and cerebrovascular disease but should be used with caution for other causes of death.

  14. For research tasks 3 and 4: Mean, median, and standard deviation (over 50...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Ullmann; Stefanie Peschel; Philipp Finger; Christian L. Müller; Anne-Laure Boulesteix (2023). For research tasks 3 and 4: Mean, median, and standard deviation (over 50 samplings of discovery/validation data) of the difference (both unscaled and scaled) between the value of the evaluation criterion on the validation data and the corresponding value on the discovery data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010820.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Theresa Ullmann; Stefanie Peschel; Philipp Finger; Christian L. Müller; Anne-Laure Boulesteix
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additionally, the effect size (mean divided by standard deviation) is reported. GCDdiscov denotes the largest GCD on the discovery data and GCDvalid the GCD resulting from the corresponding method combination on the validation data. The quantities ASWdiscov, ASWvalid (average silhouette width) are defined analogously.

  15. f

    Data from: Combinations of parameters.

    • plos.figshare.com
    xls
    Updated Sep 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue (2023). Combinations of parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0291208.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Shotaro Asano; Ryo Asaoka; Akio Oishi; Yuri Fujino; Hiroshi Murata; Keiko Azuma; Manabu Miyata; Ryo Obata; Tatsuya Inoue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PurposeTo investigate the clinical validity of the Guided Progression Analysis definition (GPAD) and cluster-based definition (CBD) with the Humphrey Field Analyzer (HFA) 10–2 test in retinitis pigmentosa (RP).MethodsTen non-progressive RP visual fields (VFs) (HFA 10–2 test) were simulated for each of 10 VFs of 111 eyes (10 simulations × 10 VF sequencies × 111 eyes = 111,000 VFs; Dataset 1). Using these simulated VFs, the specificity of GPAD for the detection of progression was determined. Using this dataset, similar analyses were conducted for the CBD, in which the HFA 10–2 test was divided into four quadrants. Subsequently, the Hybrid Definition was designed by combining the GPAD and CBD; various conditions of the GPAD and CBD were altered to approach a specificity of 95.0%. Subsequently, actual HFA 10–2 tests of 116 RP eyes (10 VFs each) were collected (Dataset 2), and true positive rate, true negative rate, false positive rate, and the time required to detect VF progression were evaluated and compared across the GPAD, CBD, and Hybrid Definition.ResultsSpecificity values were 95.4% and 98.5% for GPAD and CBD, respectively. There were no significant differences in true positive rate, true negative rate, and false positive rate between the GPAD, CBD, and Hybrid Definition. The GPAD and Hybrid Definition detected progression significantly earlier than the CBD (at 4.5, 5.0, and 4.5 years, respectively).ConclusionsThe GPAD and the optimized Hybrid Definition exhibited similar ability for the detection of progression, with the specificity reaching 95.4%.

  16. f

    Features used for the pass/shot classification.

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Features used for the pass/shot classification. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jonas Bischofberger; Arnold Baca; Erich Schikuta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.

  17. f

    SPOG 2015 FN Definition study (NCT02324231) - FN episodes

    • figshare.com
    xlsx
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christa Koenig; Roland A. Ammann; Marina Santschi (2023). SPOG 2015 FN Definition study (NCT02324231) - FN episodes [Dataset]. http://doi.org/10.6084/m9.figshare.22337248.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    figshare
    Authors
    Christa Koenig; Roland A. Ammann; Marina Santschi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SPOG 2015 FN Definition study (NCT02324231) recruited patients from April 2016 to August 2018 in six out of nine pediatric oncology centers in Switzerland. 269 patients were observed and 360 episodes of fever in neutopenia (FN) were diagnosed in 158 patients. Here data on the 360 FN episodes is published. Data are fully anonymized. In order not to compromise anonymization, information on date and times are not given. A key-file explains all variables. A data-file contains the data of the 360 FN episodes.

  18. n

    GPM GROUND VALIDATION KICT NEXRAD MC3E V1

    • cmr.earthdata.nasa.gov
    • s.cnmilf.com
    • +6more
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). GPM GROUND VALIDATION KICT NEXRAD MC3E V1 [Dataset]. http://doi.org/10.5067/MC3E/NEXRAD/DATA201
    Explore at:
    Dataset updated
    Aug 8, 2024
    Time period covered
    Apr 22, 2011 - Jun 6, 2011
    Area covered
    Description

    The GPM Ground Validaiton KICT NEXRAD MC3E dataset was collected from April 22, 2011 to June 6, 2011 for the Midlatitude Continental Convective Clouds Experiment (MC3E) which took place in central Oklahoma. The overarching goal of MC3E was to provide the most complete characterization of convective cloud systems, precipitation, and the environment that has ever been obtained, providing constraints for model cumulus parameterizations and space-based rainfall retrieval algorithms over land that had never before been available. The Next Generation Weather Radar system (NEXRAD) is comprised of 160 Weather Surveillance Radar-1988 Doppler (WSR-88D) sites throughout the United States and select overseas locations. The GPM Ground Validation NEXRAD MC3E data files are available as compressed binary files.

  19. Item statistics including mean score, standard deviation, factor loadings,...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manfred E. Beutel; Elmar Brähler; Jörg Wiltink; Matthias Michal; Eva M. Klein; Claus Jünger; Philipp S. Wild; Thomas Münzel; Maria Blettner; Karl Lackner; Stefan Nickels; Ana N. Tibubos (2023). Item statistics including mean score, standard deviation, factor loadings, and corrected item-total-correlation. [Dataset]. http://doi.org/10.1371/journal.pone.0186516.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Manfred E. Beutel; Elmar Brähler; Jörg Wiltink; Matthias Michal; Eva M. Klein; Claus Jünger; Philipp S. Wild; Thomas Münzel; Maria Blettner; Karl Lackner; Stefan Nickels; Ana N. Tibubos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Item statistics including mean score, standard deviation, factor loadings, and corrected item-total-correlation.

  20. f

    The averaged values of the validity indices for all clustering methods...

    • plos.figshare.com
    xls
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Polina Bombina; Dwayne Tally; Zachary B. Abrams; Kevin R. Coombes (2024). The averaged values of the validity indices for all clustering methods across all simulated data experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0300358.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Polina Bombina; Dwayne Tally; Zachary B. Abrams; Kevin R. Coombes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The averaged values of the validity indices for all clustering methods across all simulated data experiments.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gil Verbeke; Gil Verbeke (2025). Replication Data for: On the role of ecological validity in language and speech research [Dataset]. http://doi.org/10.18710/R5JLFR

Replication Data for: On the role of ecological validity in language and speech research

Related Article
Explore at:
txt(877), text/comma-separated-values(38323), text/comma-separated-values(6010), txt(20911), pdf(359047), pdf(160325), text/comma-separated-values(39673), pdf(45937), text/x-r-notebook(10298)Available download formats
Dataset updated
Jan 9, 2025
Dataset provided by
DataverseNO
Authors
Gil Verbeke; Gil Verbeke
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered
Apr 1, 2024 - Jun 21, 2024
Dataset funded by
Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO)
Description

Dataset abstract This dataset contains the results from 40 language and speech researchers, who completed a survey. In the first part of the survey, respondents were asked to complete a demographic (e.g., age, gender, first language) and professional background questionnaire (e.g., current academic position, research interests). In addition, they were asked several open-ended questions about their familiarity with and understanding of the term ‘ecological validity’ (e.g., which words come to mind when you hear this term, how to measure the ecological validity of a study, how does ecological validity apply to your area of research). In the second part of the survey, respondents were presented with 24 short speech excerpts, representing 12 different stimulus types. They were asked to rate each speech excerpt on its degree of casualness (i.e. spontaneity) and naturalness, and how likely they are to encounter each excerpt in everyday listening situations. Article abstract This paper explores how researchers in the field of language and speech sciences understand and apply the concept of ecological validity. It also assesses the ecological validity of various stimulus materials, ranging from isolated word productions to sentences taken from authentic interviews. Forty researchers participated in a survey, which contained (i) a demographic and professional background questionnaire with open-ended questions about the definition, feasibility and desirability of ecological validity, and (ii) a speech rating task. In the rating task, respondents evaluated 24 speech excerpts, representing 12 types of stimulus materials, on their casualness, naturalness, and likelihood of occurrence in real-life contexts. The results showed that while most researchers acknowledge the importance of ecological validity, defining the necessary and sufficient criteria for evaluating or achieving it remains challenging. Regarding stimulus types, unscripted sentences from interviews and Map Task dialogues were rated as the most casual and natural. In contrast, carefully read sentences and digitally modified stimuli were viewed as the least casual and natural, although individual differences in rating were noticeable. Similarly, ratings for the likelihood of occurrence in everyday listening situations were highest for various types of extemporaneous speech. The survey responses not only enhance our theoretical understanding of ecological validity, but also raise awareness about the implications of methodological choices, such as the selection of tasks and stimulus materials, on the ecological validity of a study.

Search
Clear search
Close search
Google apps
Main menu