100+ datasets found
  1. r

    Dataset for "Do LiU researchers publish data – and where? Dataset analysis...

    • researchdata.se
    • demo.researchdata.se
    • +1more
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaori Hoshi Larsson (2025). Dataset for "Do LiU researchers publish data – and where? Dataset analysis using ODDPub" [Dataset]. http://doi.org/10.5281/zenodo.15017715
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    Linköping University
    Authors
    Kaori Hoshi Larsson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the results from the ODDPubb text mining algorithm and the findings from manual analysis. Full-text PDFs of all articles parallel-published by Linköping University in 2022 were extracted from the institute's repository, DiVA. These were analyzed using the ODDPubb (https://github.com/quest-bih/oddpub) text mining algorithm to determine the extent of data sharing and identify the repositories where the data was shared. In addition to the results from ODDPubb, manual analysis was conducted to confirm the presence of data sharing statements, assess data availability, and identify the repositories used.

  2. I

    Self-citation analysis data based on PubMed Central subset (2002-2005)

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik, Self-citation analysis data based on PubMed Central subset (2002-2005) [Dataset]. http://doi.org/10.13012/B2IDB-9665377_V1
    Explore at:
    Authors
    Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Institutes of Health (NIH)
    U.S. National Science Foundation (NSF)
    Description

    Self-citation analysis data based on PubMed Central subset (2002-2005) ---------------------------------------------------------------------- Created by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik on April 5th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Fegley BD, Diesner J, Torvik VI (2018) Self-Citation is the Hallmark of Productive Authors, of Any Gender. PLOS ONE. It contains files for running the self citation analysis on articles published in PubMed Central between 2002 and 2005, collected in 2015. The dataset is distributed in the form of the following tab separated text files: * Training_data_2002_2005_pmc_pair_First.txt (1.2G) - Data for first authors * Training_data_2002_2005_pmc_pair_Last.txt (1.2G) - Data for last authors * Training_data_2002_2005_pmc_pair_Middle_2nd.txt (964M) - Data for middle 2nd authors * Training_data_2002_2005_pmc_pair_txt.header.txt - Header for the data * COLUMNS_DESC.txt file - Descriptions of all columns * model_text_files.tar.gz - Text files containing model coefficients and scores for model selection. * results_all_model.tar.gz - Model coefficient and result files in numpy format used for plotting purposes. v4.reviewer contains models for analysis done after reviewer comments. * README.txt file ## Dataset creation Our experiments relied on data from multiple sources including properitery data from Thompson Rueter's (now Clarivate Analytics) Web of Science collection of MEDLINE citations. Author's interested in reproducing our experiments should personally request from Clarivate Analytics for this data. However, we do make a similar but open dataset based on citations from PubMed Central which can be utilized to get similar results to those reported in our analysis. Furthermore, we have also freely shared our datasets which can be used along with the citation datasets from Clarivate Analytics, to re-create the datased used in our experiments. These datasets are listed below. If you wish to use any of those datasets please make sure you cite both the dataset as well as the paper introducing the dataset. * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * Citation data from PubMed Central (original paper includes additional citations from Web of Science) * Author-ity 2009 dataset: - Dataset citation: Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1 - Paper citation: Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. https://doi.org/10.1145/1552303.1552304 - Paper citation: Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2004). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158. https://doi.org/10.1002/asi.20105 * Genni 2.0 + Ethnea for identifying author gender and ethnicity: - Dataset citation: Torvik, Vetle (2018): Genni + Ethnea for the Author-ity 2009 dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9087546_V1 - Paper citation: Smith, B. N., Singh, M., & Torvik, V. I. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries - JCDL ’13. ACM Press. https://doi.org/10.1145/2467696.2467720 - Paper citation: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington DC, USA. http://hdl.handle.net/2142/88927 * MapAffil for identifying article country of affiliation: - Dataset citation: Torvik, Vetle I. (2018): MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4354331_V1 - Paper citation: Torvik VI. MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. D-Lib magazine : the magazine of the Digital Library Forum. 2015;21(11-12):10.1045/november2015-torvik * IMPLICIT journal similarity: - Dataset citation: Torvik, Vetle (2018): Author-implicit journal, MeSH, title-word, and affiliation-word pairs based on Author-ity 2009. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4742014_V1 * Novelty dataset for identify article level novelty: - Dataset citation: Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1 - Paper citation: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : The Magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra - Code: https://github.com/napsternxg/Novelty * Expertise dataset for identifying author expertise on articles: * Source code provided at: https://github.com/napsternxg/PubMed_SelfCitationAnalysis Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions Additional data related updates can be found at Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Self-citation analysis data based on PubMed Central subset (2002-2005) by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/PubMed_SelfCitationAnalysis.

  3. s

    Analysis of CBCS publications for Open Access, data availability statements...

    • figshare.scilifelab.se
    • researchdata.se
    • +2more
    txt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Kieselbach (2025). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data [Dataset]. http://doi.org/10.17044/scilifelab.23641749.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Umeå University
    Authors
    Theresa Kieselbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)

  4. 190k+ Medium Articles

    • kaggle.com
    zip
    Updated Apr 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabio Chiusano (2022). 190k+ Medium Articles [Dataset]. https://www.kaggle.com/datasets/fabiochiusano/medium-articles
    Explore at:
    zip(386824829 bytes)Available download formats
    Dataset updated
    Apr 26, 2022
    Authors
    Fabio Chiusano
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data source

    This data has been collected through a standard scraping process from the Medium website, looking for published articles.

    Data description

    Each row in the data is a different article published on Medium. For each article, you have the following features: - title [string]: The title of the article. - text [string]: The text content of the article. - url [string]: The URL associated to the article. - authors [list of strings]: The article authors. - timestamp [string]: The publication datetime of the article. - tags [list of strings]: List of tags associated to the article.

    Data analysis

    You can find a very quick data analysis in this notebook.

    What can I do with this data?

    • A multilabel classification model that assigns tags to articles.
    • A seq2seq model that generates article titles.
    • Text analysis.
    • Finetune text generation models on the general domain of Medium, or on specific domains by filtering articles by the appropriate tags.

    Collection methodology

    Scraping has been done with Python and the requests library. Starting from a random article on Medium, the next articles to scrape are selected by visiting: 1. The author archive pages. 2. The publication archive pages (if present). 3. The tags archives (if present).

    The article HTML pages have been parsed with the newspaper Python library.

    Published articles have been filtered for English articles only, using the Python langdetect library.

    As a consequence of the collection methodology, the scraped articles are coming from a not uniform publication date distribution. This means that there are articles published in 2016 and in 2022, but the number of articles in this dataset published in 2016 is not the same as the number of articles published in 2022. In particular, there is a strong prevalence of articles published in 2020. Have a look at the accompanying notebook to see the distribution of the publication dates.

  5. s

    Citation Trends for "Do LiU researchers publish data – and where? Dataset...

    • shibatadb.com
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2025). Citation Trends for "Do LiU researchers publish data – and where? Dataset analysis using ODDPub" [Dataset]. https://www.shibatadb.com/article/vprkiejw
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Do LiU researchers publish data – and where? Dataset analysis using ODDPub".

  6. Secondary Data from Insights from Publishing Open Data in Industry-Academia...

    • zenodo.org
    bin, json +2
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Per Erik Strandberg; Per Erik Strandberg; Philipp Peterseil; Philipp Peterseil; Julian Karoliny; Julian Karoliny; Johanna Kallio; Johanna Kallio; Johannes Peltola; Johannes Peltola (2024). Secondary Data from Insights from Publishing Open Data in Industry-Academia Collaboration [Dataset]. http://doi.org/10.5281/zenodo.13767153
    Explore at:
    json, text/x-python, bin, txtAvailable download formats
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Per Erik Strandberg; Per Erik Strandberg; Philipp Peterseil; Philipp Peterseil; Julian Karoliny; Julian Karoliny; Johanna Kallio; Johanna Kallio; Johannes Peltola; Johannes Peltola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Secondary Data from Insights from Publishing Open Data in Industry-Academia Collaboration

    Authors

    Per Erik Strandberg [1], Philipp Peterseil [2], Julian Karoliny [3], Johanna Kallio [4], and Johannes Peltola [4].

    [1] Westermo Network Technologies AB (Sweden).
    [2] Johannes Kepler University Linz (Austria)
    [3] Silicon Austria Labs GmbH (Austria).
    [4] VTT Technical Research Centre of Finland Ltd. (Finland).

    Description

    This data is to accompany a paper submitted to Elsevier's data in brief in 2024, with the title Insights from Publishing Open Data in Industry-Academia Collaboration.

    Tentative Abstract: Effective data management and sharing are critical success factors in industry-academia collaboration. This paper explores the motivations and lessons learned from publishing open data sets in such collaborations. Through a survey of participants in a European research project that published 13 data sets, and an analysis of metadata from almost 281 thousand datasets in Zenodo, we collected qualitative and quantitative results on motivations, achievements, research questions, licences and file types. Through inductive reasoning and statistical analysis we found that planning the data collection is essential, and that only few datasets (2.4%) had accompanying scripts for improved reuse. We also found that authors are not well aware of the importance of licences or which licence to choose. Finally, we found that data with a synthetic origin, collected with simulations and potentially mixed with real measurements, can be very meaningful, as predicted by Gartner and illustrated by many datasets collected in our research project.

    Secondary data from Survey

    The file survey.txt contains secondary data from a survey of participants that published open data sets in the 3-year European research project InSecTT.

    Secondary data from Zenodo

    The file secondary_data_zenodo.json contains secondary data from an analysis of data sets published in Zenodo. It is accompanied with a py-file and a ipynb-file to serve as examples.

    License

    This data is licenced with the Creative Commons Attribution 4.0 International license. You are free to use the data if you attribute the authors. Read the license text for details.

  7. e

    Journal of Data Analysis and Information Processing - impact-factor

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Journal of Data Analysis and Information Processing - impact-factor [Dataset]. https://exaly.com/journal/61638/journal-of-data-analysis-and-information-processing
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The graph shows the changes in the impact factor of ^ and its corresponding percentile for the sake of comparison with the entire literature. Impact Factor is the most common scientometric index, which is defined by the number of citations of papers in two preceding years divided by the number of papers published in those years.

  8. H

    Replication data for: An Analysis of Data Availability Statements in...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Karcher; Derek Robey; Dessislava Kirilova; Nic Weber (2025). Replication data for: An Analysis of Data Availability Statements in Qualitative Research Journal Articles [Dataset]. http://doi.org/10.7910/DVN/THG8MN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 5, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Sebastian Karcher; Derek Robey; Dessislava Kirilova; Nic Weber
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary Over the past decade, many scholarly journals have adopted policies on data sharing, with an increasing number of journals requiring that authors share the data underlying their published work. Frequently, qualitative data are excluded from those policies explicitly or implicitly. A few journals, however, intentionally do not make such a distinction. This project focuses on articles published in eight of the open-access journals maintained by Public Library of Science (PLOS). All PLOS journals introduced strict data sharing guidelines in 2014, applying to all empirical data on the basis of which articles are published. We collected a database of more than 2,300 articles containing a qualitative data component published between January 1, 2015 and August 23, 2023 and analyzed the data availability statements (DAS) researchers made regarding the availability, or lack thereof, of their data. We describe the degree to which and manner in which data are reportedly available (for example, in repositories, via institutional gate-keepers, or on request from author) versus those that are declared to be unavailable We also outline several dimensions of patterned variation in the data availability statements, including describe temporal patterns and variation by data type. Based on the results, we also provide recommendations to both researchers on how to make their data availability statements clearer, more transparent and more informative, and to journal editors and reviewers, on how to interpret and evaluate statements to ensure they accurately reflect a given data availability scenario. Finally, we suggest a workflow which can link interactions with repositories most productively as part of a typical editorial process. Data Overview This data deposit includes data and code to assemble the dataset, generate all figures and values used in the paper and appendix, and generate the codebook. It also includes the codebook and the figures. The analysis.R script and the data in data/analysis are sufficient to reproduce all findings in the paper. The additional scripts and the data files in data/raw are included for full transparency and to facilitate the detection of any errors in the data processing pipeline. Their structure is due to the development of the project over time.

  9. Publication Data Analysis: Authorship and CRediT Criteria in PLOS ONE

    • zenodo.org
    bin
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelghani Maddi; Abdelghani Maddi (2024). Publication Data Analysis: Authorship and CRediT Criteria in PLOS ONE [Dataset]. http://doi.org/10.5281/zenodo.10568892
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdelghani Maddi; Abdelghani Maddi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data table compiles publication identifiers such as DOI, MAGID, and PMID, along with crucial information. It unveils the diversity of authors' countries of origin, including a harmonized list for easy comprehension. The data also includes the publication date, year of publication, and a categorization of authorship types. This categorization includes "Authorship meets the criteria defined by PLOS ONE" for those adhering to PLOS ONE's criteria, "Authorship by resources" for those attributing authorship through resources, "Not meet authorship criteria" for those not meeting authorship criteria, and finally, "APC ring" for those involved in funding schemes through Article Processing Charges. These data provide an in-depth view of authorship dynamics within the context of scientific publications.

  10. J

    Data associated with the publication: A quantitative synthesis of outcomes...

    • archive.data.jhu.edu
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer R. Morrison; Robert M. Bernard (2024). Data associated with the publication: A quantitative synthesis of outcomes of educational technology approaches in K-12 mathematics [Dataset]. http://doi.org/10.7281/T1/GCUWSL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    Johns Hopkins Research Data Repository
    Authors
    Jennifer R. Morrison; Robert M. Bernard
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    United States Department of Education
    Description

    Dataset used in a meta-analysis examining the effects of educational technology on mathematics outcomes. Includes effects from 40 studies with codes for study and methodological features.

  11. Article data for citation analysis

    • kaggle.com
    zip
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihan Wang (2024). Article data for citation analysis [Dataset]. https://www.kaggle.com/datasets/thuwangzh/article-data
    Explore at:
    zip(344109 bytes)Available download formats
    Dataset updated
    Sep 1, 2024
    Authors
    Zihan Wang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Our data is sourced from Web of Science, an academic information retrieval platform. In the field of citation analysis, a recurring criticism revolves around ``field-dependent factors," which highlight that citation practices vary across different scientific disciplines. To enhance the credibility of our results, we focus exclusively on a single discipline, specifically Statistics & Probability, for citation analysis. Additionally, we limit our data to articles published between 2009 and 2018, as articles published within the last five years often have very few citations, which could skew the results. Moreover, there were few articles in the Statistics & Probability category before 2009. To minimize result variance, we selected articles contributed by scholars from Tsinghua University and Peking University, the two most influential universities in China, ensuring a baseline quality for the articles. In total, we exported detailed information on 566 articles from Web of Science (WoS).

  12. e

    Computational Statistics and Data Analysis - if-computation

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Computational Statistics and Data Analysis - if-computation [Dataset]. https://exaly.com/journal/14378/computational-statistics-and-data-analysis/impact-factor
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This graph shows how the impact factor of ^ is computed. The left axis depicts the number of papers published in years X-1 and X-2, and the right axis displays their citations in year X.

  13. Article_published_analysis

    • kaggle.com
    zip
    Updated Apr 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ElaKapoor (2022). Article_published_analysis [Dataset]. https://www.kaggle.com/datasets/elakapoor/article-published-analysis
    Explore at:
    zip(18085094 bytes)Available download formats
    Dataset updated
    Apr 3, 2022
    Authors
    ElaKapoor
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The data is collected as a project related to web scraping and data analysis. The data shows article published by Guardian till today. The data can be used for the time series analysis or for the sentiment analysis. It can provide the combine information such as how the sentiment changes over time. The analysis can be done related to the frequency under which the articles are published.

  14. I

    Conceptual novelty scores for PubMed articles

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhanshu Mishra; Vetle I. Torvik (2024). Conceptual novelty scores for PubMed articles [Dataset]. http://doi.org/10.13012/B2IDB-5060298_V1
    Explore at:
    Dataset updated
    Feb 1, 2024
    Authors
    Shubhanshu Mishra; Vetle I. Torvik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Institutes of Health (NIH)
    U.S. National Science Foundation (NSF)
    Description

    Conceptual novelty analysis data based on PubMed Medical Subject Headings ---------------------------------------------------------------------- Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra. It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015. The dataset is distributed in the form of the following tab separated text files: * PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow: - PMID: PubMed ID - Year: year of publication - TimeNovelty: time novelty score of the paper based on individual concepts (see paper) - VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper) - PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper) - PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper) * mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow: - MeshTerm: Name of the MeSH term - Year: year - AbsVal: Total publications with that MeSH term in the given year - TimeNovelty: age (in years since first publication) of MeSH term in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year * meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years - Mesh1: Name of the first MeSH term (alphabetically sorted) - Mesh2: Name of the second MeSH term (alphabetically sorted) - Year: year - AbsVal: Total publications with that MeSH pair in the given year - TimeNovelty: age (in years since first publication) of MeSH pair in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year * README.txt file ## Dataset creation This dataset was constructed using multiple datasets described in the following locations: * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * MeSH tree 2015: ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/ * Source code provided at: https://github.com/napsternxg/Novelty Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions: Additional data related updates can be found at: Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742 . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/Novelty

  15. Data from: Inventory of online public databases and repositories holding...

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  16. Data from: Social Media Data Analysis

    • kaggle.com
    zip
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafe Muhtasim (2021). Social Media Data Analysis [Dataset]. https://www.kaggle.com/datasets/nafemuhtasim/social-media-data-analysis
    Explore at:
    zip(29081 bytes)Available download formats
    Dataset updated
    Apr 16, 2021
    Authors
    Nafe Muhtasim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Nafe Muhtasim

    Released under CC0: Public Domain

    Contents

  17. r

    Journal of methods and measurement in the social sciences - ResearchHelpDesk...

    • researchhelpdesk.org
    Updated Feb 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). Journal of methods and measurement in the social sciences - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/384/journal-of-methods-and-measurement-in-the-social-sciences
    Explore at:
    Dataset updated
    Feb 23, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    Journal of methods and measurement in the social sciences - ResearchHelpDesk - The Journal of Methods and Measurement in the Social Sciences (JMM) is an online scholarly publication focusing on methodology and research design, measurement, and data analysis – providing a new venue for unique and interesting contributions in these study areas which frequently overlap. Focus and Scope The Journal of Methods and Measurement in the Social Sciences (JMM) publishes articles related to methodology and research design, measurement, and data analysis. The journal is published twice yearly, and features theoretical, empirical, and educational articles. JMM is meant to further our understanding of methodology and how to formulate the right questions. It is broadly concerned with improving the methods used to conduct research, the measurement of variables used in the social sciences, and improving the applications of data analysis. In addition to research articles, JMM welcomes instructional articles and brief reports or commentaries. We welcome sound, original contributions.

  18. r

    Journal of methods and measurement in the social sciences Acceptance Rate -...

    • researchhelpdesk.org
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). Journal of methods and measurement in the social sciences Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/384/journal-of-methods-and-measurement-in-the-social-sciences
    Explore at:
    Dataset updated
    Feb 15, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    Journal of methods and measurement in the social sciences Acceptance Rate - ResearchHelpDesk - The Journal of Methods and Measurement in the Social Sciences (JMM) is an online scholarly publication focusing on methodology and research design, measurement, and data analysis – providing a new venue for unique and interesting contributions in these study areas which frequently overlap. Focus and Scope The Journal of Methods and Measurement in the Social Sciences (JMM) publishes articles related to methodology and research design, measurement, and data analysis. The journal is published twice yearly, and features theoretical, empirical, and educational articles. JMM is meant to further our understanding of methodology and how to formulate the right questions. It is broadly concerned with improving the methods used to conduct research, the measurement of variables used in the social sciences, and improving the applications of data analysis. In addition to research articles, JMM welcomes instructional articles and brief reports or commentaries. We welcome sound, original contributions.

  19. S

    Survey and Analysis of the Data Policy of China’s STM journals that Selected...

    • scidb.cn
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kong li hua; Chen Shushu; Zeng Lin; Xi Yan (2023). Survey and Analysis of the Data Policy of China’s STM journals that Selected by Excellence Action Plan for China's STM Journals Program for Example [Dataset]. http://doi.org/10.57760/sciencedb.j00001.00780
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Science Data Bank
    Authors
    kong li hua; Chen Shushu; Zeng Lin; Xi Yan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Scientific data is an important output of scientific research, and a part of scholarly communication. By establishing the data policy of journal, it have great significance to promoting the data sharing, data reuse, data citation and scientific research evaluation. By means of literature research and website research, the paper scanned the data policies operation status of 302 scientific journals that were selected as the leading journals, key journals, and echelon journals in Excellence Action Plan for China's STM Journals, and analyzed the data policy setting, and the festures of the policy, such as classification types ,data availability, and data citation.

  20. Data and Code for: Methods Matter: P-Hacking and Publication Bias in Causal...

    • openicpsr.org
    Updated Mar 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abel Brodeur; Anthony Heyes; Nikolai Cook (2022). Data and Code for: Methods Matter: P-Hacking and Publication Bias in Causal Analysis in Economics: Reply [Dataset]. http://doi.org/10.3886/E165621V1
    Explore at:
    Dataset updated
    Mar 21, 2022
    Dataset provided by
    American Economic Associationhttp://www.aeaweb.org/
    Authors
    Abel Brodeur; Anthony Heyes; Nikolai Cook
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the replication package of Methods Matter: P-Hacking and Publication Bias in Causal Analysis in Economics: Reply.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kaori Hoshi Larsson (2025). Dataset for "Do LiU researchers publish data – and where? Dataset analysis using ODDPub" [Dataset]. http://doi.org/10.5281/zenodo.15017715

Dataset for "Do LiU researchers publish data – and where? Dataset analysis using ODDPub"

Explore at:
Dataset updated
Mar 19, 2025
Dataset provided by
Linköping University
Authors
Kaori Hoshi Larsson
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains the results from the ODDPubb text mining algorithm and the findings from manual analysis. Full-text PDFs of all articles parallel-published by Linköping University in 2022 were extracted from the institute's repository, DiVA. These were analyzed using the ODDPubb (https://github.com/quest-bih/oddpub) text mining algorithm to determine the extent of data sharing and identify the repositories where the data was shared. In addition to the results from ODDPubb, manual analysis was conducted to confirm the presence of data sharing statements, assess data availability, and identify the repositories used.

Search
Clear search
Close search
Google apps
Main menu