Facebook
TwitterThe journals’ author guidelines and/or editorial policies were examined on whether they take a stance with regard to the availability of the underlying data of the submitted article. The mere explicated possibility of providing supplementary material along with the submitted article was not considered as a research data policy in the present study. Furthermore, the present article excluded source codes or algorithms from the scope of the paper and thus policies related to them are not included in the analysis of the present article.
For selection of journals within the field of neurosciences, Clarivate Analytics’ InCites Journal Citation Reports database was searched using categories of neurosciences and neuroimaging. From the results, journals with the 40 highest Impact Factor (for the year 2017) indicators were extracted for scrutiny of research data policies. Respectively, the selection journals within the field of physics was created by performing a similar search with the categories of physics, applied; physics, atomic, molecular & chemical; physics, condensed matter; physics, fluids & plasmas; physics, mathematical; physics, multidisciplinary; physics, nuclear and physics, particles & fields. From the results, journals with the 40 highest Impact Factor indicators were again extracted for scrutiny. Similarly, the 40 journals representing the field of operations research were extracted by using the search category of operations research and management.
Journal-specific data policies were sought from journal specific websites providing journal specific author guidelines or editorial policies. Within the present study, the examination of journal data policies was done in May 2019. The primary data source was journal-specific author guidelines. If journal guidelines explicitly linked to the publisher’s general policy with regard to research data, these were used in the analyses of the present article. If journal-specific research data policy, or lack of, was inconsistent with the publisher’s general policies, the journal-specific policies and guidelines were prioritized and used in the present article’s data. If journals’ author guidelines were not openly available online due to, e.g., accepting submissions on an invite-only basis, the journal was not included in the data of the present article. Also journals that exclusively publish review articles were excluded and replaced with the journal having the next highest Impact Factor indicator so that each set representing the three field of sciences consisted of 40 journals. The final data thus consisted of 120 journals in total.
‘Public deposition’ refers to a scenario where researcher deposits data to a public repository and thus gives the administrative role of the data to the receiving repository. ‘Scientific sharing’ refers to a scenario where researcher administers his or her data locally and by request provides it to interested reader. Note that none of the journals examined in the present article required that all data types underlying a submitted work should be deposited into a public data repositories. However, some journals required public deposition of data of specific types. Within the journal research data policies examined in the present article, these data types are well presented by the Springer Nature policy on “Availability of data, materials, code and protocols” (Springer Nature, 2018), that is, DNA and RNA data; protein sequences and DNA and RNA sequencing data; genetic polymorphisms data; linked phenotype and genotype data; gene expression microarray data; proteomics data; macromolecular structures and crystallographic data for small molecules. Furthermore, the registration of clinical trials in a public repository was also considered as a data type in this study. The term specific data types used in the custom coding framework of the present study thus refers to both life sciences data and public registration of clinical trials. These data types have community-endorsed public repositories where deposition was most often mandated within the journals’ research data policies.
The term ‘location’ refers to whether the journal’s data policy provides suggestions or requirements for the repositories or services used to share the underlying data of the submitted works. A mere general reference to ‘public repositories’ was not considered a location suggestion, but only references to individual repositories and services. The category of ‘immediate release of data’ examines whether the journals’ research data policy addresses the timing of publication of the underlying data of submitted works. Note that even though the journals may only encourage public deposition of the data, the editorial processes could be set up so that it leads to either publication of the research data or the research data metadata in conjunction to publishing of the submitted work.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/39490/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39490/terms
Meta-analyses combine the results of many studies to find out how well a treatment or other healthcare intervention works. Most meta-analyses use public sources of data, such as published journal articles, as the main sources of information for study results. But journal articles are not the only sources of study results. Some results appear in other places, such as clinical study reports. Clinical study reports are documents that describe what researchers did and found in much more detail than journal articles. However, these reports may not be available to the public. As a result, meta-analyses may not include all available information about a treatment. The research team wanted to learn whether adding or replacing public and nonpublic data sources changed the results of meta-analyses. To find out, the research team added and replaced data as they conducted two meta-analyses. The first looked at adult use of a nerve-pain medicine. The second meta-analysis looked at adult use of a medicine to treat bipolar depression.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Self-citation analysis data based on PubMed Central subset (2002-2005) ---------------------------------------------------------------------- Created by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik on April 5th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Fegley BD, Diesner J, Torvik VI (2018) Self-Citation is the Hallmark of Productive Authors, of Any Gender. PLOS ONE. It contains files for running the self citation analysis on articles published in PubMed Central between 2002 and 2005, collected in 2015. The dataset is distributed in the form of the following tab separated text files: * Training_data_2002_2005_pmc_pair_First.txt (1.2G) - Data for first authors * Training_data_2002_2005_pmc_pair_Last.txt (1.2G) - Data for last authors * Training_data_2002_2005_pmc_pair_Middle_2nd.txt (964M) - Data for middle 2nd authors * Training_data_2002_2005_pmc_pair_txt.header.txt - Header for the data * COLUMNS_DESC.txt file - Descriptions of all columns * model_text_files.tar.gz - Text files containing model coefficients and scores for model selection. * results_all_model.tar.gz - Model coefficient and result files in numpy format used for plotting purposes. v4.reviewer contains models for analysis done after reviewer comments. * README.txt file ## Dataset creation Our experiments relied on data from multiple sources including properitery data from Thompson Rueter's (now Clarivate Analytics) Web of Science collection of MEDLINE citations. Author's interested in reproducing our experiments should personally request from Clarivate Analytics for this data. However, we do make a similar but open dataset based on citations from PubMed Central which can be utilized to get similar results to those reported in our analysis. Furthermore, we have also freely shared our datasets which can be used along with the citation datasets from Clarivate Analytics, to re-create the datased used in our experiments. These datasets are listed below. If you wish to use any of those datasets please make sure you cite both the dataset as well as the paper introducing the dataset. * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * Citation data from PubMed Central (original paper includes additional citations from Web of Science) * Author-ity 2009 dataset: - Dataset citation: Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1 - Paper citation: Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. https://doi.org/10.1145/1552303.1552304 - Paper citation: Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2004). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158. https://doi.org/10.1002/asi.20105 * Genni 2.0 + Ethnea for identifying author gender and ethnicity: - Dataset citation: Torvik, Vetle (2018): Genni + Ethnea for the Author-ity 2009 dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9087546_V1 - Paper citation: Smith, B. N., Singh, M., & Torvik, V. I. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries - JCDL ’13. ACM Press. https://doi.org/10.1145/2467696.2467720 - Paper citation: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington DC, USA. http://hdl.handle.net/2142/88927 * MapAffil for identifying article country of affiliation: - Dataset citation: Torvik, Vetle I. (2018): MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4354331_V1 - Paper citation: Torvik VI. MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. D-Lib magazine : the magazine of the Digital Library Forum. 2015;21(11-12):10.1045/november2015-torvik * IMPLICIT journal similarity: - Dataset citation: Torvik, Vetle (2018): Author-implicit journal, MeSH, title-word, and affiliation-word pairs based on Author-ity 2009. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4742014_V1 * Novelty dataset for identify article level novelty: - Dataset citation: Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1 - Paper citation: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : The Magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra - Code: https://github.com/napsternxg/Novelty * Expertise dataset for identifying author expertise on articles: * Source code provided at: https://github.com/napsternxg/PubMed_SelfCitationAnalysis Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions Additional data related updates can be found at Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Self-citation analysis data based on PubMed Central subset (2002-2005) by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/PubMed_SelfCitationAnalysis.
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.
Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.
We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.
Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.
The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.
To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.
The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.
The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:
Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.
The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.
A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.
The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.
The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
Facebook
TwitterThe dataset contains bibliographic information about scientific articles published by researchers from Norwegian research organizations and is an enhanced subset of data from the Cristin database. Cristin (current research information system in Norway) is a database with bibliographic records of all research articles with an Norwegian affiliation with a publicly funded research institution in Norway. The subset is limited to metadata about journal articles reported in the period 2013-2021 (186,621 records), and further limited to information of relevance for the study (see below). Article metadata are enhanced with open access status by several sources, particularly unpaywall, DOAJ and hybrid-information in case an article is part of a publish-and-read-deal.
Facebook
TwitterJournal of Big Data - ResearchHelpDesk - The Journal of Big Data publishes high-quality, scholarly research papers, methodologies and case studies covering a broad range of topics, from big data analytics to data-intensive computing and all applications of big data research. The journal examines the challenges facing big data today and going forward including, but not limited to: data capture and storage; search, sharing, and analytics; big data technologies; data visualization; architectures for massively parallel processing; data mining tools and techniques; machine learning algorithms for big data; cloud computing platforms; distributed file systems and databases; and scalable storage systems. Academic researchers and practitioners will find the Journal of Big Data to be a seminal source of innovative material. All articles published by the Journal of Big Data are made freely and permanently accessible online immediately upon publication, without subscription charges or registration barriers. As authors of articles published in the Journal of Big Data you are the copyright holders of your article and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate your article, according to the SpringerOpen copyright and license agreement. For those of you who are US government employees or are prevented from being copyright holders for similar reasons, SpringerOpen can accommodate non-standard copyright lines.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset offers an in-depth look into Open Access Journals worldwide, featuring a library of 163 references gathered from June 2020 to February 2021. The consortium of 10 organizations behind the study researched 1,619 journals and collected 7,019 free text submissions and other data from 94 questions. In addition to this in-depth research, three focus groups were held with 11 journals and 10 interviews were conducted with hosting platforms.
The data contained in this set includes key variables such as item type, publication year, author or title as well as ISBN numbers or ISSNs for easy identification purposes. In addition to these indicators which help identify a specific reference item, there are DOI URLs with access dates for easy online access should the physical form prove difficult to find. Alongside the rights information regarding the journal study is also available alongside storage locations cataloging different archives, libraries and file attachments if relevant. Finally viewers can use manual tags relating each field combination along with any particular notes they may have while extra fields offer additional information regarding timely updates or current progressions within the sector.
In short this dataset provides viewers everywhere an intimate glimpse into Open Access Journals around world ready at their fingertips using comprehensive yet easily absorbed facts focusing on open access standards daily maintained by some of finest minds today!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains a library of 163 references related to Open Access journal studies from an in-depth study conducted from June 2020 to February 2021 by a consortium of 10 organisations and commissioned by coalition S. This study surveyed 1,619 journals, collected 7,019 free text Submissions, and other data from 94 questions and organised three focus groups with 11 journals and 10 interviews with hosting platforms. This dataset is ideal for those who want an in-depth understanding about the state of Open Access Journals.
The columns contained in this dataset are as follows: Key (unique identifier for each reference), Item Type (type of reference e.g book or journal article), Publication Year (year of publication), Author (author of the reference), Title (titleof the reference), Publication Title(title of the publication), ISBN(international standard book number), ISSN(international standard serial number) , DOI(digital object identifier) , Url(URLof the reference) , Abstract Note(summary ofthe reference) , Date (dateof thereference ),DateAdded (date added to library ) , Date Modified (date themodified ), Access Date
- Utilizing the dataset to create a comprehensive interactive index of global open access journals. This could be used as an invaluable search engine for topics related to various open access studies, as well as make it easy for researchers/scholars/fans to connect with their interests.
- Combining this dataset with other datasets and text mining tools to uncover patterns, trends and insights on key factors affecting successful adoption of open access journals globally.
- Creating automated sentiment analysis algorithms based on the dataset that can identify impactful language and comments used in scholarly writings about Open Access Journals for improved marketing purposes by publishers and other stakeholders invested in the success of these journals
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: OADJS Library.csv | Column name | Description | |:----------------------|:---------------------------------------------------------------| | Key | Unique identifier for each reference. (String) | | Item Type | Type of reference (e.g. book, journal article, etc.). (String) | | Publication Year | Year o...
Facebook
TwitterUnited States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).
As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.
This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.
Description of the data in this data set
PublicDataEcosystem_SLR provides the structure of the protocol
Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies
Spreadsheets #2 provides the protocol structure.
Spreadsheets #3 provides the filled protocol for relevant studies.
The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information
Descriptive Information
Article number
A study number, corresponding to the study number assigned in an Excel worksheet
Complete reference
The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.
Year of publication
The year in which the study was published.
Journal article / conference paper / book chapter
The type of the paper, i.e., journal article, conference paper, or book chapter.
Journal / conference / book
Journal article, conference, where the paper is published.
DOI / Website
A link to the website where the study can be found.
Number of words
A number of words of the study.
Number of citations in Scopus and WoS
The number of citations of the paper in Scopus and WoS digital libraries.
Availability in Open Access
Availability of a study in the Open Access or Free / Full Access.
Keywords
Keywords of the paper as indicated by the authors (in the paper).
Relevance for our study (high / medium / low)
What is the relevance level of the paper for our study
Approach- and research design-related information
Approach- and research design-related information
Objective / Aim / Goal / Purpose & Research Questions
The research objective and established RQs.
Research method (including unit of analysis)
The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.
Study’s contributions
The study’s contribution as defined by the authors
Qualitative / quantitative / mixed method
Whether the study uses a qualitative, quantitative, or mixed methods approach?
Availability of the underlying research data
Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?
Period under investigation
Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)
Use of theory / theoretical concepts / approaches? If yes, specify them
Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).
Quality-related information
Quality concerns
Whether there are any quality concerns (e.g., limited information about the research methods used)?
Public Data Ecosystem-related information
Public data ecosystem definition
How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?
Public data ecosystem evolution / development
Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?
What constitutes a public data ecosystem?
What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).
Components and relationships
What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).
Stakeholders
What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?
Actors and their roles
What actors does the public data ecosystem involve? What are their roles?
Data (data types, data dynamism, data categories etc.)
What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.
Processes / activities / dimensions, data lifecycle phases
What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?
Level (if relevant)
What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).
Other elements or relationships (if any)
What other elements or relationships does the public data ecosystem consist of?
Additional comments
Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).
New papers
Does the study refer to any other potentially relevant papers?
Additional references to potentially relevant papers that were found in the analysed paper (snowballing).
Format of the file.xls, .csv (for the first spreadsheet only), .docx
Licenses or restrictionsCC-BY
For more info, see README.txt
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For each source, the specific query used and number of records returned during the most recent query are indicated. Field abbreviations are: KW = key word; PT = publication type; SO = source (i.e. journal) title; TI = title.
Facebook
TwitterJournal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk - The Journal of Big Data publishes high-quality, scholarly research papers, methodologies and case studies covering a broad range of topics, from big data analytics to data-intensive computing and all applications of big data research. The journal examines the challenges facing big data today and going forward including, but not limited to: data capture and storage; search, sharing, and analytics; big data technologies; data visualization; architectures for massively parallel processing; data mining tools and techniques; machine learning algorithms for big data; cloud computing platforms; distributed file systems and databases; and scalable storage systems. Academic researchers and practitioners will find the Journal of Big Data to be a seminal source of innovative material. All articles published by the Journal of Big Data are made freely and permanently accessible online immediately upon publication, without subscription charges or registration barriers. As authors of articles published in the Journal of Big Data you are the copyright holders of your article and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate your article, according to the SpringerOpen copyright and license agreement. For those of you who are US government employees or are prevented from being copyright holders for similar reasons, SpringerOpen can accommodate non-standard copyright lines.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundGoogle Trends is a novel, freely accessible tool that allows users to interact with Internet search data, which may provide deep insights into population behavior and health-related phenomena. However, there is limited knowledge about its potential uses and limitations. We therefore systematically reviewed health care literature using Google Trends to classify articles by topic and study aim; evaluate the methodology and validation of the tool; and address limitations for its use in research.Methods and FindingsPRISMA guidelines were followed. Two independent reviewers systematically identified studies utilizing Google Trends for health care research from MEDLINE and PubMed. Seventy studies met our inclusion criteria. Google Trends publications increased seven-fold from 2009 to 2013. Studies were classified into four topic domains: infectious disease (27% of articles), mental health and substance use (24%), other non-communicable diseases (16%), and general population behavior (33%). By use, 27% of articles utilized Google Trends for casual inference, 39% for description, and 34% for surveillance. Among surveillance studies, 92% were validated against a reference standard data source, and 80% of studies using correlation had a correlation statistic ≥0.70. Overall, 67% of articles provided a rationale for their search input. However, only 7% of articles were reproducible based on complete documentation of search strategy. We present a checklist to facilitate appropriate methodological documentation for future studies. A limitation of the study is the challenge of classifying heterogeneous studies utilizing a novel data source.ConclusionGoogle Trends is being used to study health phenomena in a variety of topic domains in myriad ways. However, poor documentation of methods precludes the reproducibility of the findings. Such documentation would enable other researchers to determine the consistency of results provided by Google Trends for a well-specified query over time. Furthermore, greater transparency can improve its reliability as a research tool.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: In Brazil, studies that map electronic healthcare databases in order to assess their suitability for use in pharmacoepidemiologic research are lacking. We aimed to identify, catalogue, and characterize Brazilian data sources for Drug Utilization Research (DUR).Methods: The present study is part of the project entitled, “Publicly Available Data Sources for Drug Utilization Research in Latin American (LatAm) Countries.” A network of Brazilian health experts was assembled to map secondary administrative data from healthcare organizations that might provide information related to medication use. A multi-phase approach including internet search of institutional government websites, traditional bibliographic databases, and experts’ input was used for mapping the data sources. The reviewers searched, screened and selected the data sources independently; disagreements were resolved by consensus. Data sources were grouped into the following categories: 1) automated databases; 2) Electronic Medical Records (EMR); 3) national surveys or datasets; 4) adverse event reporting systems; and 5) others. Each data source was characterized by accessibility, geographic granularity, setting, type of data (aggregate or individual-level), and years of coverage. We also searched for publications related to each data source.Results: A total of 62 data sources were identified and screened; 38 met the eligibility criteria for inclusion and were fully characterized. We grouped 23 (60%) as automated databases, four (11%) as adverse event reporting systems, four (11%) as EMRs, three (8%) as national surveys or datasets, and four (11%) as other types. Eighteen (47%) were classified as publicly and conveniently accessible online; providing information at national level. Most of them offered more than 5 years of comprehensive data coverage, and presented data at both the individual and aggregated levels. No information about population coverage was found. Drug coding is not uniform; each data source has its own coding system, depending on the purpose of the data. At least one scientific publication was found for each publicly available data source.Conclusions: There are several types of data sources for DUR in Brazil, but a uniform system for drug classification and data quality evaluation does not exist. The extent of population covered by year is unknown. Our comprehensive and structured inventory reveals a need for full characterization of these data sources.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data file for the third release of the Data Citation Corpus, produced by DataCite and Make Data Count as part of an ongoing grant project funded by the Wellcome Trust. Read more about the project.
The data file includes 5,322,388 data citation records in JSON and CSV formats. The JSON file is the version of record.
For convenience, the data is provided in batches of approximately 1 million records each. The publication date and batch number are included in the file name, ex: 2025-02-01-data-citation-corpus-01-v3.0.json.
The data citations in the file originate from the following sources:
DataCite Event Data
A project by Chan Zuckerberg Initiative (CZI) to identify mentions to datasets in the full text of articles
Data citations identified Aligning Science Across Parkinson’s (ASAP)
Each data citation record is comprised of:
A pair of identifiers: An identifier for the dataset (a DOI or an accession number) and the DOI of the publication (journal article or preprint) in which the dataset is cited
Metadata for the cited dataset and for the citing publication
The data file includes the following fields:
Field
Description
Required?
id
Internal identifier for the citation
Yes
created
Date of item's incorporation into the corpus
Yes
updated
Date of item's most recent update in corpus
Yes
repository
Repository where cited data is stored
No
publisher
Publisher for the article citing the data
No
journal
Journal for the article citing the data
No
title
Title of cited data
No
publication
DOI of article where data is cited
Yes
dataset
DOI or accession number of cited data
Yes
publishedDate
Date when citing article was published
No
source
Source where citation was harvested
Yes
subjects
Subject information for cited data
No
affiliations
Affiliation information for creator of cited data
No
funders
Funding information for cited data
No
Additional documentation about the citations and metadata in the file is available on the Make Data Count website.
Notes on v3.0:
The third release of the Data Citation Corpus data file reflects a few changes made to add new citations, including those from a new data source (ASAP), update and enhance citation metadata, and improve the overall usability of the file. These changes are as follows:
Add and update Event Data citations:
Add 65,524 new data citations created in DataCite Event Data between August 2024 and December 2024
Add ASAP citations:
Add 750 new data citations provided by Aligning Science Across Parkinson’s (ASAP), identified through processes to evaluate compliance with ASAP’s for open science practices, which involve a partnership with DataSeer and internal curation (described here).
Citations with provenance from ASAP are identified as “asap” in the source field
Metadata enhancements:
Reconcile and normalize organization names for affiliations and funders in a subset of records with the Research Organization Registry (ROR)
Add ror_name and ror_id subfields for affiliations and funders in JSON files. Unreconciled affiliation and funder strings are identified with values of null
Add new columns affiliationsROR and fundersROR in CSV files. Unreconciled affiliation and funder strings are identified with values of NONE NONE (this is to ensure consistency in number and order of values in cases where some strings have been reconciled and others have not)
Normalize DOI formats for articles and papers as full URLs
Additional details about the above changes, including scripts used to perform the above tasks, are available in GitHub.
Additional enhancements to the corpus are ongoing and will be addressed in the course of subsequent releases. Users are invited to submit feedback via GitHub. For general questions, email info@makedatacount.org.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contain data sources used in bibliometric studies. Every row corresponds to one data source so there will be multiple rows for articles containing multiple sources. The column norm_coarse corresponds to data source category and norm_fine to data source (only available for bibliographical metadata sources). See article for details.Related article: Lindelöw, C. H., Hammarfelt, B., & Mazoni, A. (2025). Data sources used in bibliometrics 1978–2022: From proprietary databases to the great wide open. Journal of the Association for Information Science and Technology, 1–14. https://doi.org/10.1002/asi.25018
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table lists the 7 diseases in 9 locations analyzed, for a total of 14 disease-location contexts. For each context, we list the language used as a location proxy, the inclusive start and end dates of analysis, the resolution of the disease incidence data, and one or more citations for those data.Diseases-location contexts analyzed, with data sources.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes bibliographic information for 501 papers that were published from 2010-April 2017 (time of search) and use online biodiversity databases for research purposes. Our overarching goal in this study is to determine how research uses of biodiversity data developed during a time of unprecedented growth of online data resources. We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. Specifically, we address the following questions:
1.) What primary biodiversity databases have been cited in published research, and which
databases have been cited most often?
2.) Is the biodiversity research community citing databases appropriately, and are
the cited databases currently accessible online?
3.) What are the most common uses, general taxa addressed, and data linkages, and how
have they changed over time?
4.) What uses have the highest impact, as measured through the mean number of citations
per year?
5.) Are certain uses applied more often for plants/invertebrates/vertebrates?
6.) Are links to specific data types associated more often with particular uses?
7.) How often are major data quality issues addressed?
8.) What data quality issues tend to be addressed for the top uses?
Relevant papers for this analysis include those that use online and openly accessible primary occurrence records, or those that add data to an online database. Google Scholar (GS) provides full-text indexing, which was important to identify data sources that often appear buried in the methods section of a paper. Our search was therefore restricted to GS. All authors discussed and agreed upon representative search terms, which were relatively broad to capture a variety of databases hosting primary occurrence records. The terms included: “species occurrence” database (8,800 results), “natural history collection” database (634 results), herbarium database (16,500 results), “biodiversity database” (3,350 results), “primary biodiversity data” database (483 results), “museum collection” database (4,480 results), “digital accessible information” database (10 results), and “digital accessible knowledge” database (52 results)--note that quotations are used as part of the search terms where specific phrases are needed in whole. We downloaded all records returned by each search (or the first 500 if there were more) into a Zotero reference management database. About one third of the 2500 papers in the final dataset were relevant. Three of the authors with specialized knowledge of the field characterized relevant papers using a standardized tagging protocol based on a series of key topics of interest. We developed a list of potential tags and descriptions for each topic, including: database(s) used, database accessibility, scale of study, region of study, taxa addressed, research use of data, other data types linked to species occurrence data, data quality issues addressed, authors, institutions, and funding sources. Each tagged paper was thoroughly checked by a second tagger.
The final dataset of tagged papers allow us to quantify general areas of research made possible by the expansion of online species occurrence databases, and trends over time. Analyses of this data will be published in a separate quantitative review.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains insightful data related to forecasting studies, highlighting the work of 100 influential papers from the last 40 years, hand-selected for their high citation levels and enduring importance. Through this valuable dataset, analyze more than just the publications year or average citations per year; data points like number of time series used, methods employed and measures performed are also tracked here. With this comprehensive review of forecasting studies in different domains, realize and appreciate the lasting impact these papers have had on this field
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset offers an in-depth study on forecasting studies conducted over the past forty years. With it, you can easily survey the extensive literature surrounding forecasting and get an understanding of how issues within this field have evolved and developed over time.
To begin using this dataset, take a look at the columns included, and gain a better understanding of what they mean: - Name: The name of each paper in the dataset. - Url: The URL of each paper. - Cites Per Year: The average number of citations for each paper per year. This is an indication that these papers are important and influential within the field.
- Year: The year each paper was published. This can help to understand how topics have changed throughout time with regards to forecasting practices and trends .
- # Time Series: The number of time series used in each paper – useful if specific datasets are being surveyed across many publications or research topics need to be studied against particular datasets used over a period of time
- # Methods & # Measures: These indicate which methods or measures were used by different researchers at different points in time to investigate various aspects related to forecasting studies – again helping with tracking changes across topics and areas related to forecastingOnce familiar with what information is contained within this dataset, users will be able to filter through data more efficiently depending on their research needs; allowing them access relevant results quickly without having to trawl through hundreds volumes worth of material!
- Analyzing the trends in time series forecasting methods over the years.
- Identifying commonly-used time series and methodologies for forecasting different types of events, such as stock prices or industry trends.
- Identifying new methods or measures which have yielded higher accuracy levels than existing ones for specific types of forecast studies
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: reviewed articles.csv | Column name | Description | |:-------------------|:------------------------------------------------------------------| | Name | The name of the paper. (String) | | Url | The URL of the paper. (String) | | Cites per year | The average number of citations per year for the paper. (Integer) | | Year | The year the paper was published. (Integer) | | # time series | The number of time series used in the paper. (Integer) | | # methods | The number of methods used in the paper. (Integer) | | # measures | The number of measures used in the paper. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Facebook
TwitterDatabase of journal information that provides tools to search, sort, filter, compare, and evaluate scholarly journals. In addition to searching by journal name, category or publisher, authors can use title and abstract of paper to discover journals that have already published articles on similar topics. Data sources include major industry data sets, public resources, information submitted directly by journal editors, and real-life publishing experiences submitted by authors.
Facebook
TwitterThe journals’ author guidelines and/or editorial policies were examined on whether they take a stance with regard to the availability of the underlying data of the submitted article. The mere explicated possibility of providing supplementary material along with the submitted article was not considered as a research data policy in the present study. Furthermore, the present article excluded source codes or algorithms from the scope of the paper and thus policies related to them are not included in the analysis of the present article.
For selection of journals within the field of neurosciences, Clarivate Analytics’ InCites Journal Citation Reports database was searched using categories of neurosciences and neuroimaging. From the results, journals with the 40 highest Impact Factor (for the year 2017) indicators were extracted for scrutiny of research data policies. Respectively, the selection journals within the field of physics was created by performing a similar search with the categories of physics, applied; physics, atomic, molecular & chemical; physics, condensed matter; physics, fluids & plasmas; physics, mathematical; physics, multidisciplinary; physics, nuclear and physics, particles & fields. From the results, journals with the 40 highest Impact Factor indicators were again extracted for scrutiny. Similarly, the 40 journals representing the field of operations research were extracted by using the search category of operations research and management.
Journal-specific data policies were sought from journal specific websites providing journal specific author guidelines or editorial policies. Within the present study, the examination of journal data policies was done in May 2019. The primary data source was journal-specific author guidelines. If journal guidelines explicitly linked to the publisher’s general policy with regard to research data, these were used in the analyses of the present article. If journal-specific research data policy, or lack of, was inconsistent with the publisher’s general policies, the journal-specific policies and guidelines were prioritized and used in the present article’s data. If journals’ author guidelines were not openly available online due to, e.g., accepting submissions on an invite-only basis, the journal was not included in the data of the present article. Also journals that exclusively publish review articles were excluded and replaced with the journal having the next highest Impact Factor indicator so that each set representing the three field of sciences consisted of 40 journals. The final data thus consisted of 120 journals in total.
‘Public deposition’ refers to a scenario where researcher deposits data to a public repository and thus gives the administrative role of the data to the receiving repository. ‘Scientific sharing’ refers to a scenario where researcher administers his or her data locally and by request provides it to interested reader. Note that none of the journals examined in the present article required that all data types underlying a submitted work should be deposited into a public data repositories. However, some journals required public deposition of data of specific types. Within the journal research data policies examined in the present article, these data types are well presented by the Springer Nature policy on “Availability of data, materials, code and protocols” (Springer Nature, 2018), that is, DNA and RNA data; protein sequences and DNA and RNA sequencing data; genetic polymorphisms data; linked phenotype and genotype data; gene expression microarray data; proteomics data; macromolecular structures and crystallographic data for small molecules. Furthermore, the registration of clinical trials in a public repository was also considered as a data type in this study. The term specific data types used in the custom coding framework of the present study thus refers to both life sciences data and public registration of clinical trials. These data types have community-endorsed public repositories where deposition was most often mandated within the journals’ research data policies.
The term ‘location’ refers to whether the journal’s data policy provides suggestions or requirements for the repositories or services used to share the underlying data of the submitted works. A mere general reference to ‘public repositories’ was not considered a location suggestion, but only references to individual repositories and services. The category of ‘immediate release of data’ examines whether the journals’ research data policy addresses the timing of publication of the underlying data of submitted works. Note that even though the journals may only encourage public deposition of the data, the editorial processes could be set up so that it leads to either publication of the research data or the research data metadata in conjunction to publishing of the submitted work.