17 datasets found
  1. d

    Replication data for: An Analysis of Data Availability Statements in...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karcher, Sebastian; Robey, Derek; Kirilova, Dessislava; Weber, Nic (2025). Replication data for: An Analysis of Data Availability Statements in Qualitative Research Journal Articles [Dataset]. http://doi.org/10.7910/DVN/THG8MN
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Karcher, Sebastian; Robey, Derek; Kirilova, Dessislava; Weber, Nic
    Description

    Summary Over the past decade, many scholarly journals have adopted policies on data sharing, with an increasing number of journals requiring that authors share the data underlying their published work. Frequently, qualitative data are excluded from those policies explicitly or implicitly. A few journals, however, intentionally do not make such a distinction. This project focuses on articles published in eight of the open-access journals maintained by Public Library of Science (PLOS). All PLOS journals introduced strict data sharing guidelines in 2014, applying to all empirical data on the basis of which articles are published. We collected a database of more than 2,300 articles containing a qualitative data component published between January 1, 2015 and August 23, 2023 and analyzed the data availability statements (DAS) researchers made regarding the availability, or lack thereof, of their data. We describe the degree to which and manner in which data are reportedly available (for example, in repositories, via institutional gate-keepers, or on request from author) versus those that are declared to be unavailable We also outline several dimensions of patterned variation in the data availability statements, including describe temporal patterns and variation by data type. Based on the results, we also provide recommendations to both researchers on how to make their data availability statements clearer, more transparent and more informative, and to journal editors and reviewers, on how to interpret and evaluate statements to ensure they accurately reflect a given data availability scenario. Finally, we suggest a workflow which can link interactions with repositories most productively as part of a typical editorial process. Data Overview This data deposit includes data and code to assemble the dataset, generate all figures and values used in the paper and appendix, and generate the codebook. It also includes the codebook and the figures. The analysis.R script and the data in data/analysis are sufficient to reproduce all findings in the paper. The additional scripts and the data files in data/raw are included for full transparency and to facilitate the detection of any errors in the data processing pipeline. Their structure is due to the development of the project over time.

  2. s

    Analysis of CBCS publications for Open Access, data availability statements...

    • figshare.scilifelab.se
    • researchdata.se
    • +2more
    txt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Kieselbach (2025). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data [Dataset]. http://doi.org/10.17044/scilifelab.23641749.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Umeå University
    Authors
    Theresa Kieselbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)

  3. Data from: Data sharing in PLOS ONE: An analysis of Data Availability...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated Feb 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa Federer (2018). Data sharing in PLOS ONE: An analysis of Data Availability Statements [Dataset]. http://doi.org/10.6084/m9.figshare.5690878.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 9, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Lisa Federer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains Data Availability Statements from 47,593 papers published in PLOS ONE between March 2014 (when the policy went into effect) and May 2016, analyzed for type of statement.

  4. Data Availability Statements in the 2020 and 2021 scientific publications of...

    • zenodo.org
    • nde-dev.biothings.io
    • +2more
    csv, pdf
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaisa Kylmälä; Kaisa Kylmälä; Tomi Toikko; Tomi Toikko (2024). Data Availability Statements in the 2020 and 2021 scientific publications of Tampere University [Dataset]. http://doi.org/10.5281/zenodo.7564441
    Explore at:
    pdf, csvAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kaisa Kylmälä; Kaisa Kylmälä; Tomi Toikko; Tomi Toikko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Tampere
    Description

    For this dataset, scientific peer-reviewed articles by Tampere University researchers from the years 2020 and 2021 were extracted from the TUNICRIS. A random sample of 40 percent was taken from the listed 4,922 publications according to faculties and years. There were 2,085 analyzed articles, i.e. more than 42 percent of the total number.

    To find Data Availability Statements, articles were opened one by one and searched for mentions of research data and its availability. For each article, it was written down whether DAS existed and where in the article it was located. From the contents of DAS, information about data availability, location, openness and possible restrictions on use was written down.

    Dataset also includes information about the journals and publications taken from TUNICRIS.

    The prevalence of DAS and data openness were examined in relation to different variables. Tampere University faculty information has been removed from the dataset.

    Related slides: https://doi.org/10.5281/zenodo.7655892

    Related article (in Finnish): Toikko, T., & Kylmälä, K. (2023). Tutkimusdatan saatavuustiedot tieteellisissä artikkeleissa: Raportti Data Availability Statementien käytöstä Tampereen yliopistossa. Informaatiotutkimus, 42(1-2), 31–50. https://doi.org/10.23978/inf.126098

  5. r

    Analysis of publications of the Swedish Metabolomics Centre for Open Access...

    • researchdata.se
    • figshare.scilifelab.se
    • +1more
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Kieselbach (2025). Analysis of publications of the Swedish Metabolomics Centre for Open Access licenses, data availability statements and access to data [Dataset]. http://doi.org/10.17044/SCILIFELAB.29392007
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Umeå University
    Authors
    Theresa Kieselbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Content and data sourceThis dataset contains the results of a manual analysis of Open Science markers in the publications of the Swedish Metabolomics Centre (SMC) between 2016 and 2024. It contains similar variables as the data of the "Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data" (Kieselbach, 2023).

    The sample of these publications was fetched from SciLifeLab on 5 May 2025 at the URL: https://publications.scilifelab.se/label/Swedish Metabolomics Centre (SMC)

    It contains 285 articles that are the source data for the work to create this dataset. Every publication was manually visited at its DOI URL and checked for 23 variables.

    Questions studiedSome of the questions that were addressed in the collection of the data are:

    1. Does the article have an open license and what kind of license does it have?

    2. Does the article contain research data that may have restricted access such as personal data and health data?

    3. Does the article contain a data availability statement?

    4. Does the article contain supplementary material that the authors added to it?

    5. Does the supplementary material contain research data?

    6. Does the supplementary material contain metabolomics data such as, for instance, summaries and visualizations?

    7. Did the authors submit metabolomics data to MetaboLights at the EBI or to other repsoitories?

    8. Did the authors submit other data to other repositories?

    9. Is data available on request from the authors?

    Visualization of dataThe data was compiled and visualized using Microsoft Excel 365. The visualization includes one table that gives a general overview of the dataset, and four figures that show some results of the analysis.

    Figure 1. Percentage of publications between 2016 and 2024 with an Open Access License and with a data availability statement.

    Figure 2. Submissions to repositories between 2016 and 2024.

    Figure 3. Percentage of publications that contained supplementary material and if this supplementary material contained research data and metabolomics data.

    Figure 4. Repositories used by the authors between 2016 and 2024.

    List of variables1. Year of Publication (answer: year)

    1. Date of Publication (answer: date)

    2. DOI (answer: DOI)

    3. DOI URL (answer: DOI URL)

    4. Research article (answer: Yes or No)

    5. Access to article without paywall (answer: Yes or No)

    6. License for research article (answer: Name of the license or No)

    7. Data with restricted access (answer: Yes or No)

    8. Data availability statement in article (answer: Yes or No)

    9. Supplementary material added to article (answer: Yes or No)

    10. Access to supplementary material without paywall (answer: Yes or No)

    11. Supplementary material contains research data (answer: Yes or No)

    12. Supplementary data contains metabolomics data (answer: Yes or No)

    13. Persistent identifier for supplementary data (answer: Yes or No)

    14. Source data added to the article (answer: Yes or No)

    15. Source data contain metabolomics data (answer: Yes or No)

    16. Authors submitted metabolomics data to MetaboLights (answer: Yes or No)

    17. Authors submitted metabolomics data to another repository (answer: name of the repository or No)

    18. Authors submitted other data to a repository (answer: name of the repository or No)

    19. Authors submitted other data to a second repository (answer: name of the repository or No)

    20. Authors submitted other data to a third repository (answer: name of the repository or No)

    21. Authors submitted code to a repository (answer: name of the repository or No)

    22. Data available on request from the authors (answer: Yes or No)

    Variables that are available in the source data1. Title of article

    1. Authors

    2. Journal

    3. Year

    4. (Date) Published

    5. (Date) E-published

    6. Volume

    7. Issue

    8. Pages

    9. DOI

    10. PMID

    11. Labels

    12. Qualifiers

    13. IUID

    14. URL

    15. DOI URL of research article

    16. PubMed URL of research article

    File formats and softwareThe file formats used in this dataset are:

    .csv (Text file)

    .jpg (JPEG image file)

    .pdf/A (Portable Document Format for archiving)

    .txt (Text file)

    .xlsx (Microsoft Excel 365 file)

    All files can be opened with Microsoft Office 365.

    ReferenceKieselbach, Theresa (2023). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data. Umeå University. Dataset. https://doi.org/10.17044/scilifelab.23641749.v1

    AbbreviationsCC BY 4.0: Creative Commons Attribution 4.0 International Public License

    CC BY-NC 4.0: Creative Commons Attribution-NonCommercial 4.0 International Public License

    CC BY-NC 3.0: Creative Commons Attribution-NonCommercial 3.0 International Public License

    CC BY-NC-ND 4.0: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License

    DOI: Digital Object Identifier

    EBI: European Bioinformatics Institute

    EBI-ArrayExpress: The ArrayExpress collection of functional genomics data at the EBI

    EBI-ENA: European Nucleotide Archive at the EBI

    EBI-Pride: Proteomics Identification Database at the EBI

    e!DAL: electronic Data Archive Library at the Leibniz Institute for Plant Genetics and Crop Plant Research

    IUID: Item Unique identification

    LUDC: Lund University Diabetes Centre

    LUDC repository: data repository at the Lund University Diabetes Centre

    NCBI: National Center for Biotechnology Information

    NCBI-GEO: The Gene Expression Omnibus database repository at the NCBI

    NCBI-SRA: The Sequence Read Archive at the NCBI

    PMID: Pubmed Identifier

    URL: Uniform Resource Locator

    MD5 Checksums of the filesManifest.txt (2 KB): 89f32a728fb74ebecef0aef4633130b0

    README.txt (6 KB): 34ea4ad9cb9bdea54755fa87f2d0b913

    Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.csv (46 KB): 9719df26381901bc6aabfd34fdbfab81

    Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.xlsx (49 KB): 1ec95dc29262645240e7d8714967bcfc

    Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.csv (391 Bytes): 1fd723dc6f52f18251d41c0d343a4f0f

    Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.xlsx (9 KB): 38622a9681c6f1057a6e1a4be56b0285

    Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.csv (468 Bytes): 9f9156f8d52603ccdec968f626bc002a

    Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.jpg (119 KB): dc9a4d7de4c789e8aea46ce66e007301

    Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.xlsx (15 KB): 6527d1ebd0069ef3757bd1b049f0fc74

    Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.csv (300 Bytes): 5abc4a0fcf776f8dc4745f41deddacbc

    Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.jpg (126 KB): e03e5bf4ba2d942c3b022aebb0a59033

    Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.xlsx (15 KB): a80f977c051d4798db221b07733c694b

    Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.csv (670 Bytes): a694a3defa98aa52fcdec8ff9e9e3316

    Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.jpg(153 KB): 3928bdc1f046ca9b6f66bdbcdf936ca8

    Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.xlsx (15 KB): 46dfda56b116b571b4bf8e3674b44512

    Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.csv (498 Bytes): 8963a412cc9e458ced2e80883bb93e1a

    Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.jpg (137 KB): c9ba447225e99431f24732128a754b7e

    Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.xlsx (16 KB): 1e2813d3ccb0ee14991b276947c21b8a

    Materials_and_methods_SMC_publications_2016_2024.docx (19 KB): 71776ffc1e530e1b40255763403b2f40

    Materials_and_methods_SMC_publications_2016_2024.txt (4 KB): 26c4b91b958b9e33d93d13dc52b25da9

    Materials_and_methods_SMC_publications_2026_2024.pdf (172 KB): eee564f452ef4f3cf57bb81a6874fcd4

    SMC_publications_2016_2024_status_2025_05_05.csv (143 KB): 5e61d09244ca90b1e5b057a7afdfe5e7

    SMC_publications_2016_2024_status_2025_05_05.xlsx (106 KB): 6977fbcac21ff5a12763e40de90c0a91

  6. Data Availability Statement.

    • figshare.com
    docx
    Updated Feb 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olga Alcântara Barros (2021). Data Availability Statement. [Dataset]. http://doi.org/10.6084/m9.figshare.13951607.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Feb 12, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Olga Alcântara Barros
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We analyzed all the samples using a stereomicroscope, Olympus C011 trinocular microscope, coupled with a CCD camera. All the samples were measured and photographed by the Infinity Capture software.The drawn was improved with a drawing table, Parblo A610 – Graphhic tablet using the program ImageJ (Public Dominic). The geographical location of the Araripe Basin was produced using the software QGIS Geographic Information System (version 3.12 – QGIS.org – Public Dominic) considering the coordinate system Datum – SIRGAS 200 from Instituto Brasileiro de Geografia e Estatística (IBGE, Brazil) and Companhia de Pesquisa de Recursos Minerais (CPRM, Brazil). The stratigraphy of the Santana group was drawn with program ImageJ (Public Dominic) to according with stratigraphy on Neumann & Cabreira, 1999 and Valença et al., 2003.

  7. Description of coding categories and example statements.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa M. Federer; Christopher W. Belter; Douglas J. Joubert; Alicia Livinski; Ya-Ling Lu; Lissa N. Snyders; Holly Thompson (2023). Description of coding categories and example statements. [Dataset]. http://doi.org/10.1371/journal.pone.0194768.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lisa M. Federer; Christopher W. Belter; Douglas J. Joubert; Alicia Livinski; Ya-Ling Lu; Lissa N. Snyders; Holly Thompson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of coding categories and example statements.

  8. Dataset #1: Cross-sectional survey data

    • figshare.com
    txt
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Adam Baimel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    N.B. This is not real data. Only here for an example for project templates.

    Project Title: Add title here

    Project Team: Add contact information for research project team members

    Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

    Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

    Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

    Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

    Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

    Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

    List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

    Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

    Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

  9. Table1_Data Availability of Open T-Cell Receptor Repertoire Data, a...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Ning Huang; Naresh Amrat Patel; Jay Himanshu Mehta; Srishti Ginjala; Petter Brodin; Clive M. Gray; Yesha M. Patel; Lindsay G. Cowell; Amanda M. Burkhardt; Serghei Mangul (2023). Table1_Data Availability of Open T-Cell Receptor Repertoire Data, a Systematic Assessment.DOCX [Dataset]. http://doi.org/10.3389/fsysb.2022.918792.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Yu-Ning Huang; Naresh Amrat Patel; Jay Himanshu Mehta; Srishti Ginjala; Petter Brodin; Clive M. Gray; Yesha M. Patel; Lindsay G. Cowell; Amanda M. Burkhardt; Serghei Mangul
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern data-driven research has the power to promote novel biomedical discoveries through secondary analyses of raw data. Therefore, it is important to ensure data-driven research with great reproducibility and robustness for promoting a precise and accurate secondary analysis of the immunogenomics data. In scientific research, rigorous conduct in designing and conducting experiments is needed, specifically in scientific writing and reporting results. It is also crucial to make raw data available, discoverable, and well described or annotated in order to promote future re-analysis of the data. In order to assess the data availability of published T cell receptor (TCR) repertoire data, we examined 11,918 TCR-Seq samples corresponding to 134 TCR-Seq studies ranging from 2006 to 2022. Among the 134 studies, only 38.1% had publicly available raw TCR-Seq data shared in public repositories. We also found a statistically significant association between the presence of data availability statements and the increase in raw data availability (p = 0.014). Yet, 46.8% of studies with data availability statements failed to share the raw TCR-Seq data. There is a pressing need for the biomedical community to increase awareness of the importance of promoting raw data availability in scientific research and take immediate action to improve its raw data availability enabling cost-effective secondary analysis of existing immunogenomics data by the larger scientific community.

  10. r

    Referenzierung von Forschungsdatenpublikationen in RADAR

    • radar-service.eu
    tar
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dorothea Strecker (2025). Referenzierung von Forschungsdatenpublikationen in RADAR [Dataset]. http://doi.org/10.22000/fbhfgzy8d43r3tjw
    Explore at:
    tar(78336 bytes)Available download formats
    Dataset updated
    Mar 21, 2025
    Dataset provided by
    Humboldt-Universität zu Berlin
    Authors
    Dorothea Strecker
    Description

    Description

    This dataset describes how datasets published in the research data repository RADAR are referenced, combining references extracted from Google Scholar, DataCite Event Data and the Data Citation Corpus.

    DOIs assigned to RADAR datasets were retrieved from the RADAR API 2025-01-27. References in the three data sources were then identified using these DOIs. Each research output referencing a RADAR dataset was accessed to determine where the reference occurred in the full text. Author names and publication dates for datasets and referencing objects were added from OpenAlex and DataCite on 2025-02-10. Author names of datasets and referencing objects were compared to determine if data reuse occurred.

    Columns

    • from: DOI of the referencing object
    • to: DOI of the RADAR dataset
    • from_date: publication date of the referencing object
    • to_date: publication date of the RADAR dataset
    • source_gs: boolean indicating if the reference was found in Google Scholar
    • source_dcc: boolean indicating if the reference was found in the Data Citation Corpus
    • source_ded: boolean indicating if the reference was found in DataCite Event Data
    • method_rl: boolean indicating if the dataset was referenced in the reference list
    • method_das: boolean indicating if the dataset was referenced in the data availability statement
    • method_fn: boolean indicating if the dataset was referenced in a footnote
    • method_ft: boolean indicating if the dataset was referenced in other parts of the full text, for example in the methods section
    • reuse_author: variable indicating if the reference is indicating data (overlap in the author names of dataset and referencing object) use or data reuse (no overlap)
  11. Z

    Global suicide mortality rates (2000-2019) and bibliographic data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranckeviciene, Erinija (2024). Global suicide mortality rates (2000-2019) and bibliographic data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12267301
    Explore at:
    Dataset updated
    Jun 22, 2024
    Dataset provided by
    Vytautas Magnus University
    Authors
    Pranckeviciene, Erinija
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains World Bank Suicide mortality rate WDI (world development indicator) (2000-2019) world-wide data in original and processed form. In addition to the statistical data this dataset also contains bibliographic records of articles published on the topic of suicide in relation to individual countries during (2000-2019) in original and processed form.

    The data consists of six archives:

    World development indicator suicide mortality rate SH.STA.SUIC.P5. This archive contains suicide mortality rate of 159 countries during the period of 2000-2019 per 100,000 population including males and females as of November, 2023.

    Web of science records country and suicide. This archive contains bibliographic records organized by country on the topic of suicide related to that country published during 2000-2019 as of November, 2023.

    Suicide mortality rate statistics and keywords. This archive contains processed data of 1 and 2 archives in three files. The 'Countries suicide rates and WOS records' contains organized temporal suicide mortality rate data for each country and each year for males and females including counts of articles on suicide related in that country. The 'words and countries matrix' file contains information about how many times author and paper keywords from suicide related publications were seen in articles associated with each country. This data is organized as matrix in which rows are keywords, columns are countries and cells are counts of the keyword. The 'words and countries pairs' file contains same information only organized as keyword country pairs.

    Suicide mortality rate clusters countries keywords titles. This archive contains bibliographic data organized by country clusters. These clusters group countries with similar suicide mortality rate dynamics in males and females shown in two included figures. Each folder of the cluster contains a section with bibliographic records; a section with keywords associated with each country; and a section in which each publication associated with the country has a separate filecontaining its title and keywords.

    Suicide keywords embedding data. This archive contains word embedding vectors and metadata learned by recurrent neural network trained to classify countries from suicide related keywords of articles associated with those countries. Folder 'trained with keywords' contains embeddings learned in classifying countries in which training samples are keyword strings of publications. Folder 'trained with titles' contains embeddings learned in classifying countries in which training samples are strings containing titles of publication plus keywords.

    Suicide keywords association rule mining. This archive contains files of subsets of keywords frequently mentioned together in suicide related publications. Folder 'Mining in clusters' has frequent keyword itemsets in country clusters. Folder 'Mining in individual countries' has frequent keyword itemsets in countries. Examples of keyword networks connecting clusters and networks connecting countries in individual clusters are included which helps to identify specific and shared keywords by country clusters and by countries in the individual clusters.

    These datasets support a data availability statements for upcoming articles.

  12. f

    Scoring Criteria used for the assessment.

    • plos.figshare.com
    xls
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haya Deeb; Suzanna Creasey; Diego Lucini de Ugarte; George Strevens; Trisha Usman; Hwee Yun Wong; Megan A. M. Kutzer; Emma Wilson; Tomasz Zieliński; Andrew J. Millar (2025). Scoring Criteria used for the assessment. [Dataset]. http://doi.org/10.1371/journal.pone.0328065.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Haya Deeb; Suzanna Creasey; Diego Lucini de Ugarte; George Strevens; Trisha Usman; Hwee Yun Wong; Megan A. M. Kutzer; Emma Wilson; Tomasz Zieliński; Andrew J. Millar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Open science promotes the accessibility of scientific research and data, emphasising transparency, reproducibility, and collaboration. This study assesses the Openness and FAIR (Findable, Accessible, Interoperable, and Reusable) aspects of data-sharing practices within the biosciences at the University of Edinburgh from 2014 to 2023. We analysed 555 research papers across biotechnology, regenerative medicine, infectious diseases, and non-communicable diseases. Our scoring system evaluated data completeness, reusability, accessibility, and licensing, finding a progressive shift towards better data-sharing practices. The fraction of publications that share all relevant data increased significantly, from 7% in 2014 to 45% in 2023. Data involving genomic sequences were shared more frequently than image data or data on human subjects or samples. The presence of data availability statement (DAS) or preprint sharing correlated with more and better data sharing, particularly in terms of completeness. We discuss local and systemic factors underlying the current and future Open data sharing. Evaluating the automated ODDPub (Open Data Detection in Publications) tool on this manually-scored dataset demonstrated high specificity in identifying cases where no data was shared. ODDPub sensitivity improved with better documentation in the DAS. This positive trend highlights improvements in data-sharing, advocating for continued advances and addressing challenges with data types and documentation.

  13. Open Science Indicators for a corpus of 8,131 research articles published by...

    • figshare.com
    xlsx
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Taylor-Grant; Eilise Norris (2025). Open Science Indicators for a corpus of 8,131 research articles published by Taylor & Francis journals [Dataset]. http://doi.org/10.6084/m9.figshare.30316342.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Rebecca Taylor-Grant; Eilise Norris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset represents a set of Open Science Indicators generated by the AI solution provider DataSeer using a corpus of 8,131 research articles published in Taylor & Francis journals in 2023.The corpus was selected using purposive sampling of journal titles to ensure inclusion of Open Access and Open Select (hybrid) journals; journals with a variety of data sharing policies; and journals representing a range of disciplines including life sciences, medicine and health, earth sciences, social sciences and psychology. From each journal in the corpus a random representative sample of between approximately 10-50 articles was selected and the full 2023 published output of any single journal is not included.The DataSeer analysis identified Open Science indicators including:- Presence of data availability statements;- Evidence of data sharing (via supplementary files or data repositories);- Evidence of code sharing;- Evidence of preprinting;- Pre-registration of studies;- Use of persistent identifiers (ORCIDs and RRIDs).As the dataset was generated by an AI tool, some errors or inaccuracies may be present. Before sharing the dataset publicly, the project team at Taylor & Francis undertook data cleansing to ensure that the dataset is comprehensible to an external audience and to enhance its reusability. Notes on data cleansing are included in the README file in the dataset spreadsheet, along with explanations of columns headers where needed.

  14. f

    This is the Data Availability Statement.

    • figshare.com
    xlsx
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chilot Kassa Mekonnen; Hailemichael Kindie Abate; Abere Woretaw Azagew; Muluken Chanie Agimas (2025). This is the Data Availability Statement. [Dataset]. http://doi.org/10.1371/journal.pone.0324363.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Chilot Kassa Mekonnen; Hailemichael Kindie Abate; Abere Woretaw Azagew; Muluken Chanie Agimas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionEpilepsy is a common non-communicable neurological disorder associated with recurrent seeding of cerebral neurons or brain cells and episodes of unprovoked seizures with or without loss of consciousness. Although there are studies on the health-related quality of life of epilepsy patients in Ethiopia, there are remarkable variations in the estimates of health-related quality of life.ObjectivesThis systematic review and meta-analysis aimed to determine the pooled effect size of the health-related quality of life of adult epilepsy patients in Ethiopia.MethodsOriginal articles about the health-related quality of life among epilepsy patients in Ethiopia were searched through known and international databases (PubMed, Scopus, and Web of Science) and search engines (Google and Google Scholar). Data were extracted using a standard data extraction checklist developed according to Joanna Briggs Institute (JBI). The I2 statistics were used to identify heterogeneity across studies. Funnel plot asymmetry and Egger’s tests were used to check for publication bias. The STATA version 11 software was employed for statistical analysis to pool the mean scores of health-related quality-of-life.ResultA total of 16 cross-sectional studies with a sample size of 5294 took part. The pooled overall mean score of health-related quality of life among epilepsy patients in Ethiopia was 52.82 ± 13.24 [95%CI (46.41, 59.21)], I2 = 100%, p-value

  15. Journals code.

    • plos.figshare.com
    xlsx
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinge Zhao; Xin Zhang; Liandi Dai; Baoguo Ma; Yuting Duan; Yan Xu; Hongmei Wei; Shengwei Wu; Linghui Xiong (2025). Journals code. [Dataset]. http://doi.org/10.1371/journal.pone.0331697.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 2, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pinge Zhao; Xin Zhang; Liandi Dai; Baoguo Ma; Yuting Duan; Yan Xu; Hongmei Wei; Shengwei Wu; Linghui Xiong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Responsible data sharing in clinical research can enhance the transparency and reproducibility of research evidence, thereby increasing the overall value of research. Since 2024, more than 5,000 journals have adhered to the International Committee of Medical Journal Editors (ICMJE) Data Sharing Statement (DSS) to promote data sharing. However, due to the significant effort required for data sharing and the scarcity of academic rewards, data availability in clinical research remains suboptimal. This study aims to explore the impact of biomedical journal policies and available supporting information on the implementation of data availability in clinical research publications This cross-sectional study will select 303 journals and their latest publications as samples from the biomedical journals listed in the Web of Science Journal Citation Reports based on stratified random sampling according to the 2023 Journal Impact Factor (JIF). Two researchers will independently extract journal data-sharing policies from the submission guidelines of eligible journals and data-sharing details from publications using a pre-designed form from Apr 2025 to Dec 2025. The data sharing levels of publications will be based on the openness of the data-sharing mechanism. Binomial logistic regression analyses will be used to identify potential journal factors that affect publication data-sharing levels. This protocol has been registered in Open Science Framework (OSF) Registries: https://doi.org/10.17605/OSF.IO/EX6DV.

  16. Datasets and code relating to: "Practical considerations for trace DNA...

    • figshare.com
    txt
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan Deliveyne (2025). Datasets and code relating to: "Practical considerations for trace DNA recovery and detection of an invasive reptile across different deposition scenarios". [Dataset]. http://doi.org/10.6084/m9.figshare.26693584.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Nathan Deliveyne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record includes data and code relating to the manuscript titled "Practical considerations for trace DNA recovery and detection of an invasive reptile across different deposition scenarios" as part of the Data Availability statement. The .csv files contain DNA extract quantification values, LAMP time to detection data and all the data relating to trace DNA amplification testing from swab samples and eDNA. The script is required to conduct statistical analysis in R.

  17. f

    Summary of datasets.

    • plos.figshare.com
    xls
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ka Hyun Park; Junghun Kim; U Kang (2025). Summary of datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0333915.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 21, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ka Hyun Park; Junghun Kim; U Kang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    How can we build accurate transcription models for both ordinary speech and characterized speech in a semi-supervised setting? ASR (Automatic Speech Recognition) systems are widely used in various real-world applications, including translation systems and transcription services. ASR models are tailored to serve one of two types of speeches: 1) ordinary speech (e.g., speeches from the general population) and 2) characterized speech (e.g., speeches from speakers with special traits, such as certain nationalities or speech disorders). Recently, the limited availability of labeled speech data and the high cost of manual labeling have drawn significant attention to the development of semi-supervised ASR systems. Previous semi-supervised ASR models employ a pseudo-labeling scheme to incorporate unlabeled examples during training. However, these methods rely heavily on pseudo labels during training and are therefore highly sensitive to the quality of pseudo labels. The issue of low-quality pseudo labels is particularly pronounced for characterized speech, due to the limited availability of data specific to a certain trait. This scarcity hinders the initial ASR model’s ability to effectively capture the unique characteristics of characterized speech, resulting in inaccurate pseudo labels. In this paper, we propose a framework for training accurate ASR models for both ordinary and characterized speeches in a semi-supervised setting. Specifically, we propose MOCA (Multi-hypotheses-based Curriculum learning for semi-supervised Asr) for ordinary speech and MOCA-S for characterized speech. MOCA and MOCA-S generate multiple hypotheses for each speech instance to reduce the heavy reliance on potentially inaccurate pseudo labels. Moreover, MOCA-S for characterized speech effectively supplements the limited trait-specific speech data by exploiting speeches of the other traits. Specifically, MOCA-S adjusts the number of pseudo labels based on the relevance to the target trait. Extensive experiments on real-world speech datasets show that MOCA and MOCA-S significantly improve the accuracy of previous ASR models.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Karcher, Sebastian; Robey, Derek; Kirilova, Dessislava; Weber, Nic (2025). Replication data for: An Analysis of Data Availability Statements in Qualitative Research Journal Articles [Dataset]. http://doi.org/10.7910/DVN/THG8MN

Replication data for: An Analysis of Data Availability Statements in Qualitative Research Journal Articles

Explore at:
Dataset updated
Oct 29, 2025
Dataset provided by
Harvard Dataverse
Authors
Karcher, Sebastian; Robey, Derek; Kirilova, Dessislava; Weber, Nic
Description

Summary Over the past decade, many scholarly journals have adopted policies on data sharing, with an increasing number of journals requiring that authors share the data underlying their published work. Frequently, qualitative data are excluded from those policies explicitly or implicitly. A few journals, however, intentionally do not make such a distinction. This project focuses on articles published in eight of the open-access journals maintained by Public Library of Science (PLOS). All PLOS journals introduced strict data sharing guidelines in 2014, applying to all empirical data on the basis of which articles are published. We collected a database of more than 2,300 articles containing a qualitative data component published between January 1, 2015 and August 23, 2023 and analyzed the data availability statements (DAS) researchers made regarding the availability, or lack thereof, of their data. We describe the degree to which and manner in which data are reportedly available (for example, in repositories, via institutional gate-keepers, or on request from author) versus those that are declared to be unavailable We also outline several dimensions of patterned variation in the data availability statements, including describe temporal patterns and variation by data type. Based on the results, we also provide recommendations to both researchers on how to make their data availability statements clearer, more transparent and more informative, and to journal editors and reviewers, on how to interpret and evaluate statements to ensure they accurately reflect a given data availability scenario. Finally, we suggest a workflow which can link interactions with repositories most productively as part of a typical editorial process. Data Overview This data deposit includes data and code to assemble the dataset, generate all figures and values used in the paper and appendix, and generate the codebook. It also includes the codebook and the figures. The analysis.R script and the data in data/analysis are sufficient to reproduce all findings in the paper. The additional scripts and the data files in data/raw are included for full transparency and to facilitate the detection of any errors in the data processing pipeline. Their structure is due to the development of the project over time.

Search
Clear search
Close search
Google apps
Main menu