100+ datasets found

Medium articles dataset
kaggle.com
crawlfeeds.com
zip
Updated May 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2021). Medium articles dataset [Dataset]. https://www.kaggle.com/crawlfeeds/medium-articles-dataset
Explore at:
zip(21800753 bytes)Available download formats
Dataset updated
May 9, 2021
Authors
Crawl Feeds
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Medium Articles dataset

Medium is an American online publishing platform launched in August 2012. Crawl Feeds team extracted data from medium articles for research and analysis purposes.

Fields

Total fields: 15

url, crawled_at, id, title, author, published_at, author_url, reading_time, total_claps, raw_description, source, description, tags, images, modified_at

Get complete dataset from crawl feeds over more than 500K+ records Link
News Articles Dataset from Indian Express
kaggle.com
zip
Updated Jun 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pulkit Komal (2020). News Articles Dataset from Indian Express [Dataset]. https://www.kaggle.com/datasets/pulkitkomal/news-article-data-set-from-indian-express/code
Explore at:
zip(23052019 bytes)Available download formats
Dataset updated
Jun 8, 2020
Authors
Pulkit Komal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset contains 20k news headlines, descriptions & articles from August 11, 2019 to June 8, 2020 obtained from Indian Express.

Acknowledgements

This dataset was obtained from www.indianexpress.com

Content

article_id: has generated article id's. headline: headline of the article. desc: description of the article date: date and time of the article url: url of the article articles: full article article_type: short, mid, long values to show the length of the article. article_length: ength of the article.
o
NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and...
registry.opendata.aws
Updated Jul 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (NLM) (2021). NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS [Dataset]. https://registry.opendata.aws/ncbi-pmc/
Explore at:
Dataset updated
Jul 4, 2021
Dataset provided by
<a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
Description
PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on general license type:

The PMC Open Access (OA) Subset, which includes all articles in PMC with a machine-readable Creative Commons license

The Author Manuscript Dataset, which includes all articles collected under a funder policy in PMC and made available in machine-readable formats for text mining

These datasets collectively span more than half of PMC’s total collection of full-text articles. PMC enables access to these datasets to expand the impact of open access and publicly-funded research; enable greater machine learning across the spectrum of scientific research; reach new audiences; and open new doors for discovery. The bucket in this registry contains individual articles in NISO Z39.96-2015 JATS XML format as well as in plain text as extracted from the XML. The bucket is updated daily with new and updated articles. Also included are file lists that include metadata for articles in each dataset.
f
The data used in this article is provided in S1 Data.
datasetcatalog.nlm.nih.gov
plos.figshare.com
+1more
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peng, Jiquan; Zhao, Yan; Li, Xijian; Wei, Zeng; Wang, Cheng (2025). The data used in this article is provided in S1 Data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002041736
Explore at:
Dataset updated
Mar 21, 2025
Authors
Peng, Jiquan; Zhao, Yan; Li, Xijian; Wei, Zeng; Wang, Cheng
Description
The data used in this article is provided in S1 Data.
4
Data underlying the research of four scenarios in the operation of water...
data.4tu.nl
figshare.com
zip
Updated Apr 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hang Wan (2021). Data underlying the research of four scenarios in the operation of water discharge patterns of a dam [Dataset]. http://doi.org/10.4121/14398946.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14398946.v1
Dataset updated
Apr 14, 2021
Dataset provided by
4TU.ResearchData
Authors
Hang Wan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Reservoir operation rules(1) continuous flood discharge with ecological priority(2) pulse flood discharge with ecological priority
(3) pulse flood discharge with equal weight of ecology and power generation(4) pulse flood discharge with power generation priority
Data from: Research data lifecycle.
plos.figshare.com
figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laure Perrier; Erik Blondal; A. Patricia Ayala; Dylanne Dearborn; Tim Kenny; David Lightfoot; Roger Reka; Mindy Thuna; Leanne Trimble; Heather MacDonald (2023). Research data lifecycle. [Dataset]. http://doi.org/10.1371/journal.pone.0178261.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0178261.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Laure Perrier; Erik Blondal; A. Patricia Ayala; Dylanne Dearborn; Tim Kenny; David Lightfoot; Roger Reka; Mindy Thuna; Leanne Trimble; Heather MacDonald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research data lifecycle.
f
This file contains the raw data of all papers collected
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mascheroni, Pietro; Brooks, Steven; Doering, Stefan; Seidel, Jan; Amugongo, Lameck Mbangula (2025). This file contains the raw data of all papers collected [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002074276
Explore at:
Dataset updated
Jun 11, 2025
Authors
Mascheroni, Pietro; Brooks, Steven; Doering, Stefan; Seidel, Jan; Amugongo, Lameck Mbangula
Description
This file contains the raw data of all papers collected
Prevalence of journal-specific features (peer-reviewed journal articles...
plos.figshare.com
xls
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brady T. West; Joseph W. Sakshaug; Guy Alain S. Aurelien (2023). Prevalence of journal-specific features (peer-reviewed journal articles only). [Dataset]. http://doi.org/10.1371/journal.pone.0158120.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158120.t005
Dataset updated
Jun 15, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Brady T. West; Joseph W. Sakshaug; Guy Alain S. Aurelien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Prevalence of journal-specific features (peer-reviewed journal articles only).
Data from: Are scholarly articles disproportionately read in their own...
search.datacite.org
figshare.com
Updated Jan 16, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Thelwall (2014). Are scholarly articles disproportionately read in their own country? An analysis of Mendeley readers [Dataset]. http://doi.org/10.6084/m9.figshare.902197
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.902197
Dataset updated
Jan 16, 2014
Dataset provided by
DataCitehttps://www.datacite.org/
Figsharehttp://figshare.com/
Authors
Mike Thelwall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics from the paper: Are scholarly articles disproportionately read in their own country? An analysis of Mendeley readers
by Mike Thelwall and Nabeil Maflahi
A study of the impact of data sharing on article citations using journal...
plos.figshare.com
dataverse.harvard.edu
+1more
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Garret Christensen; Allan Dafoe; Edward Miguel; Don A. Moore; Andrew K. Rose (2023). A study of the impact of data sharing on article citations using journal policies as a natural experiment [Dataset]. http://doi.org/10.1371/journal.pone.0225883
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0225883
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Garret Christensen; Allan Dafoe; Edward Miguel; Don A. Moore; Andrew K. Rose
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study estimates the effect of data sharing on the citations of academic articles, using journal policies as a natural experiment. We begin by examining 17 high-impact journals that have adopted the requirement that data from published articles be publicly posted. We match these 17 journals to 13 journals without policy changes and find that empirical articles published just before their change in editorial policy have citation rates with no statistically significant difference from those published shortly after the shift. We then ask whether this null result stems from poor compliance with data sharing policies, and use the data sharing policy changes as instrumental variables to examine more closely two leading journals in economics and political science with relatively strong enforcement of new data policies. We find that articles that make their data available receive 97 additional citations (estimate standard error of 34). We conclude that: a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.
u
Data from: Current and projected research data storage needs of Agricultural...
agdatacommons.nal.usda.gov
datasets.ai
+2more
pdf
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia Parr (2023). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. http://doi.org/10.15482/USDA.ADC/1346946
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1346946
Dataset updated
Nov 30, 2023
Dataset provided by
Ag Data Commons
Authors
Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.

Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
n
Data of top 50 most cited articles about COVID-19 and the complications of...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jan 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanya Singh; Jagadish Rao Padubidri; Pavanchand Shetty H; Matthew Antony Manoj; Therese Mary; Bhanu Thejaswi Pallempati (2024). Data of top 50 most cited articles about COVID-19 and the complications of COVID-19 [Dataset]. http://doi.org/10.5061/dryad.tx95x6b4m
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tx95x6b4m
Dataset updated
Jan 10, 2024
Dataset provided by
Kasturba Medical College, Mangalore
Authors
Tanya Singh; Jagadish Rao Padubidri; Pavanchand Shetty H; Matthew Antony Manoj; Therese Mary; Bhanu Thejaswi Pallempati
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Background This bibliometric analysis examines the top 50 most-cited articles on COVID-19 complications, offering insights into the multifaceted impact of the virus. Since its emergence in Wuhan in December 2019, COVID-19 has evolved into a global health crisis, with over 770 million confirmed cases and 6.9 million deaths as of September 2023. Initially recognized as a respiratory illness causing pneumonia and ARDS, its diverse complications extend to cardiovascular, gastrointestinal, renal, hematological, neurological, endocrinological, ophthalmological, hepatobiliary, and dermatological systems. Methods Identifying the top 50 articles from a pool of 5940 in Scopus, the analysis spans November 2019 to July 2021, employing terms related to COVID-19 and complications. Rigorous review criteria excluded non-relevant studies, basic science research, and animal models. The authors independently reviewed articles, considering factors like title, citations, publication year, journal, impact factor, authors, study details, and patient demographics. Results The focus is primarily on 2020 publications (96%), with all articles being open-access. Leading journals include The Lancet, NEJM, and JAMA, with prominent contributions from Internal Medicine (46.9%) and Pulmonary Medicine (14.5%). China played a major role (34.9%), followed by France and Belgium. Clinical features were the primary study topic (68%), often utilizing retrospective designs (24%). Among 22,477 patients analyzed, 54.8% were male, with the most common age group being 26–65 years (63.2%). Complications affected 13.9% of patients, with a recovery rate of 57.8%. Conclusion Analyzing these top-cited articles offers clinicians and researchers a comprehensive, timely understanding of influential COVID-19 literature. This approach uncovers attributes contributing to high citations and provides authors with valuable insights for crafting impactful research. As a strategic tool, this analysis facilitates staying updated and making meaningful contributions to the dynamic field of COVID-19 research. Methods A bibliometric analysis of the most cited articles about COVID-19 complications was conducted in July 2021 using all journals indexed in Elsevier’s Scopus and Thomas Reuter’s Web of Science from November 1, 2019 to July 1, 2021. All journals were selected for inclusion regardless of country of origin, language, medical speciality, or electronic availability of articles or abstracts. The terms were combined as follows: (“COVID-19” OR “COVID19” OR “SARS-COV-2” OR “SARSCOV2” OR “SARS 2” OR “Novel coronavirus” OR “2019-nCov” OR “Coronavirus”) AND (“Complication” OR “Long Term Complication” OR “Post-Intensive Care Syndrome” OR “Venous Thromboembolism” OR “Acute Kidney Injury” OR “Acute Liver Injury” OR “Post COVID-19 Syndrome” OR “Acute Cardiac Injury” OR “Cardiac Arrest” OR “Stroke” OR “Embolism” OR “Septic Shock” OR “Disseminated Intravascular Coagulation” OR “Secondary Infection” OR “Blood Clots” OR “Cytokine Release Syndrome” OR “Paediatric Inflammatory Multisystem Syndrome” OR “Vaccine Induced Thrombosis with Thrombocytopenia Syndrome” OR “Aspergillosis” OR “Mucormycosis” OR “Autoimmune Thrombocytopenia Anaemia” OR “Immune Thrombocytopenia” OR “Subacute Thyroiditis” OR “Acute Respiratory Failure” OR “Acute Respiratory Distress Syndrome” OR “Pneumonia” OR “Subcutaneous Emphysema” OR “Pneumothorax” OR “Pneumomediastinum” OR “Encephalopathy” OR “Pancreatitis” OR “Chronic Fatigue” OR “Rhabdomyolysis” OR “Neurologic Complication” OR “Cardiovascular Complications” OR “Psychiatric Complication” OR “Respiratory Complication” OR “Cardiac Complication” OR “Vascular Complication” OR “Renal Complication” OR “Gastrointestinal Complication” OR “Haematological Complication” OR “Hepatobiliary Complication” OR “Musculoskeletal Complication” OR “Genitourinary Complication” OR “Otorhinolaryngology Complication” OR “Dermatological Complication” OR “Paediatric Complication” OR “Geriatric Complication” OR “Pregnancy Complication”) in the Title, Abstract or Keyword. A total of 5940 articles were accessed, of which the top 50 most cited articles about COVID-19 and Complications of COVID-19 were selected through Scopus. Each article was reviewed for its appropriateness for inclusion. The articles were independently reviewed by three researchers (JRP, MAM and TS) (Table 1). Differences in opinion with regard to article inclusion were resolved by consensus. The inclusion criteria specified articles that were focused on COVID-19 and Complications of COVID-19. Articles were excluded if they did not relate to COVID-19 and or complications of COVID-19, Basic Science Research and studies using animal models or phantoms. Review articles, Viewpoints, Guidelines, Perspectives and Meta-analysis were also excluded from the top 50 most-cited articles (Table 1). The top 50 most-cited articles were compiled in a single database and the relevant data was extracted. The database included: Article Title, Scopus Citations, Year of Publication, Journal, Journal Impact Factor, Authors, Number of Authors, Department Affiliation, Number of Institutions, Country of Origin, Study Topic, Study Design, Sample Size, Open Access, Non-Original Articles, Patient/Participants Age, Gender, Symptoms, Signs, Co-morbidities, Complications, Imaging Modalities Used and outcome.
4
Benchmarking logs to test scalability of process discovery algorithms
data.4tu.nl
figshare.com
zip
Updated Oct 12, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wil van der Aalst (2017). Benchmarking logs to test scalability of process discovery algorithms [Dataset]. http://doi.org/10.4121/uuid:1cc41f8a-3557-499a-8b34-880c1251bd6e
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:1cc41f8a-3557-499a-8b34-880c1251bd6e
Dataset updated
Oct 12, 2017
Dataset provided by
Eindhoven University of Technology
Authors
Wil van der Aalst
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
The set of event logs included, are aimed to support the evaluation of the performance of process discovery algorithms. The largest event logs in this data set have millions of events. If you need even bigger datasets, you can generate these yourself using the CPN Tools sources files included (*.cpn). Each file has two parameters nofcases (i.e., the number of process instances) and nofdupl (i.e., the number of times a process is replicated with unique new names).
4
Supplementary data for the article: Survey on eHMI concepts: The effect of...
data.4tu.nl
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavlo Bazilinskyy; Dimitra Dodou; Joost de Winter, Supplementary data for the article: Survey on eHMI concepts: The effect of text, color, and perspective [Dataset]. http://doi.org/10.4121/12708869.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/12708869.v2
Dataset provided by
4TU.Centre for Research Data
Authors
Pavlo Bazilinskyy; Dimitra Dodou; Joost de Winter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Supplementary data for the paper: Bazilinskyy, P., Dodou, D., & De Winter, J. C. F. (2019). Survey on eHMI concepts: The effect of text, color, and perspective. Transportation Research Part F.
I
Dynamic torque data files used in the discussion of torque estimation in...
data.4tu.nl
datasetcatalog.nlm.nih.gov
+1more
Updated Jul 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unai Gutierrez Santiago (2021). Dynamic torque data files used in the discussion of torque estimation in Wind Turbine Gearboxes Using Fiber Optical Strain Sensors [Dataset]. http://doi.org/10.4121/14892234.v1
Explore at:
Unique identifier
https://doi.org/10.4121/14892234.v1
Dataset updated
Jul 7, 2021
Dataset provided by
4TU.ResearchData
Authors
Unai Gutierrez Santiago
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
The 2 data sets contain the processed data used for the discussion section of the paper "Input Torque Measurements for Wind Turbine Gearboxes Using Fiber Optical Strain Sensors":
There are 2 data sets, one for each torque a linealy variable torque test and the other for a test where torque is changed in steps.
Each data sets includes the following variables: -str_fos: strain from the 54 fiber optical strain sensors-t_fos: time associated to strain data-LSS_taco: data from inductive sensor at low-speed shaft (once per revolution pulse)-HSS_M1_torque: torque data from test bench torque transducer at position 1.-HSS_M2_torque: torque data from test bench torque transducer at position 2.
-t_dq: time associated to analogue signals LSS_taco, HSS_M1 and HSS_M2.
Data from: Sizing the Problem of Improving Discovery and Access to...
figshare.com
xlsx
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Read (2016). Sizing the Problem of Improving Discovery and Access to NIH-funded Data: A preliminary study [Dataset]. http://doi.org/10.6084/m9.figshare.1285515.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1285515.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kevin Read
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To inform efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by National Institutes of Health (NIH)-funded researchers. Of particular interest is characterizing those datasets that are not deposited in a known data repository or registry, e.g., those for which a related journal article does not indicate that underlying data have been deposited in a known repository. Such “invisible” datasets comprise the “long tail” of biomedical data and pose significant practical challenges to ongoing efforts to improve discoverability of and access to biomedical research data. This study identified datasets used to support the NIH-funded research reported in articles published in 2011 and cited in PubMed® and deposited in PubMed Central® (PMC). After searching for all articles that acknowledged NIH support, we first identified articles that contained explicit mention of datasets being deposited in recognized repositories. Thirty members of the NIH staff then analyzed a random sample of the remaining articles to estimate how many and what types of datasets were used per article. Two reviewers independently examined each paper. Each dataset is titled Bigdata_randomsample_xxxx_xx. The xxxx refers to the set of articles the annotator looked at, while the xxidentifies the annotator that did the analysis. Within each dataset, the author has listed the number of datasets they identified within the articles that they looked at. For every dataset that was found, the annotators were asked to insert a new row into the spreadsheet, and then describe the dataset they found (e.g., type of data, subject of study, etc.). Each row in the spreadsheet was always prepended by the PubMed Identifier (PMID) where the dataset was found. Finally, the files 2013-08-07_Bigdatastudy_dataanalysis, Dataanalysis_ack_si_datasets, and Datasets additional random sample mention vs deposit 20150313 refer to the analysis that was performed based on each annotator's analysis of the publications they were assigned, and the data deposits identified from the analysis.
4
Data underlying the publication: ‘Effects of E. coli Nissle 1917 on the...
data.4tu.nl
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Feb 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geervliet. M. (Mirelle); Hugo de Vries; Christine A. Jansen; V.P.M.G. (Victor) Rutten; Hubèrt van Hees; Caifang Wen; Kerstin Skovgaard; Giacomo Antonello; H.F.J. (Huub) Savelkoul; Hauke Smidt; E.J. (Edwin) Tijhaar; Jerry M. Wells (2022). Data underlying the publication: ‘Effects of E. coli Nissle 1917 on the Porcine Gut Microbiota and Immune System in Early Life’ [Dataset]. http://doi.org/10.4121/15060177.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/15060177.v1
Dataset updated
Feb 9, 2022
Dataset provided by
4TU.ResearchData
Authors
Geervliet. M. (Mirelle); Hugo de Vries; Christine A. Jansen; V.P.M.G. (Victor) Rutten; Hubèrt van Hees; Caifang Wen; Kerstin Skovgaard; Giacomo Antonello; H.F.J. (Huub) Savelkoul; Hauke Smidt; E.J. (Edwin) Tijhaar; Jerry M. Wells
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Sep 2018 - Nov 2018
Area covered
The Netherlands, St. Antonis, Swine Research Centre
Description
This dataset contains data collected during an in vivo experiment with pigs at the Wageningen University as part of the PhD Thesis Projects of Mirelle Geervliet and Hugo de Vries (first authors of the manuscript). This research project was made possible by The Netherlands Organisation for Scientific Research and Vereniging Diervoeders Nederland (VDN).
u
Jornada Experimental Range (USDA-ARS) monthly stocking data and pasture...
agdatacommons.nal.usda.gov
datasetcatalog.nlm.nih.gov
+2more
bin
Updated Nov 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Ragosta; Kris Havstad; Brandon Bestelmeyer; Darren James (2025). Jornada Experimental Range (USDA-ARS) monthly stocking data and pasture shape files from 1915 to 1952 [Dataset]. http://doi.org/10.6073/pasta/2254860ed7a15c1016e24385700a8052
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6073/pasta/2254860ed7a15c1016e24385700a8052
Dataset updated
Nov 24, 2025
Dataset provided by
Environmental Data Initiative (EDI)
Authors
John Ragosta; Kris Havstad; Brandon Bestelmeyer; Darren James
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data package contains two types of data for the Jornada Experimental Range (JER) from 1915 to 1952: 1) shape files containing polygons and attribute tables that represent the pasture configurations on the Jornada Experimental Range and 2) monthly stocking data from these pastures. The livestock represented in the stocking data comprise cattle, horse, sheep, and goats. Grazing goats were infrequent and are grouped with sheep in the source data. As such for this data set, they are included in the sheep category. Stocking data are expressed in animal unit months (AUM), which is based on metabolic weight.This data package provides finer resolution AUM data than knb-lter-jrn.210412001, which presents the annual stocking data for the entire JER from 1916 to 2001. The stocking data in this package begins in June of 1915 and continues through December of 1952, the last year for which the researchers on this project have verified and digitized historical pasture configurations on the JER.https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210412001
n
Real-World Distribution Network and Loading Data
data.ncl.ac.uk
resodate.org
xlsx
Updated Sep 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilias Sarantakos; David Greenwood; Peter Davison; Haris Patsios (2021). Real-World Distribution Network and Loading Data [Dataset]. http://doi.org/10.25405/data.ncl.16456014.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.25405/data.ncl.16456014.v1
Dataset updated
Sep 1, 2021
Dataset provided by
Newcastle University
Authors
Ilias Sarantakos; David Greenwood; Peter Davison; Haris Patsios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Network and loading data for a real-world distribution network in the North-East of England.
4
Data underlying the research: Identification of potential hub genes and...
data.4tu.nl
figshare.com
zip
Updated Mar 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingli Sun (2023). Data underlying the research: Identification of potential hub genes and therapeutic drugs in ovarian cancer via bioinformatics analysis [Dataset]. http://doi.org/10.4121/22267030.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/22267030.v1
Dataset updated
Mar 16, 2023
Dataset provided by
4TU.ResearchData
Authors
Jingli Sun
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Our findings indicate new insights into the underlying pathogenesis of OC, and the identified hub genes and drugs may improve individualized diagnosis and therapy for OC.

Facebook

Twitter

Click to copy link

Link copied

Cite

Crawl Feeds (2021). Medium articles dataset [Dataset]. https://www.kaggle.com/crawlfeeds/medium-articles-dataset

Medium articles dataset

Medium articles dataset in JSON format

Explore at:

383 scholarly articles cite this dataset (View in Google Scholar)

zip(21800753 bytes)Available download formats

Dataset updated

May 9, 2021

Authors

Crawl Feeds

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Medium Articles dataset

Medium is an American online publishing platform launched in August 2012. Crawl Feeds team extracted data from medium articles for research and analysis purposes.

Fields

Total fields: 15

url, crawled_at, id, title, author, published_at, author_url, reading_time, total_claps, raw_description, source, description, tags, images, modified_at

Get complete dataset from crawl feeds over more than 500K+ records Link

Clear search

Close search

Google apps

Main menu

Medium articles dataset

Medium Articles dataset

News Articles Dataset from Indian Express

Context

Acknowledgements

Content

NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and...

The data used in this article is provided in S1 Data.

Data underlying the research of four scenarios in the operation of water...

Data from: Research data lifecycle.

This file contains the raw data of all papers collected

Prevalence of journal-specific features (peer-reviewed journal articles...

Data from: Are scholarly articles disproportionately read in their own...

A study of the impact of data sharing on article citations using journal...

Data from: Current and projected research data storage needs of Agricultural...

Data of top 50 most cited articles about COVID-19 and the complications of...

Benchmarking logs to test scalability of process discovery algorithms

Supplementary data for the article: Survey on eHMI concepts: The effect of...

Dynamic torque data files used in the discussion of torque estimation in...

Data from: Sizing the Problem of Improving Discovery and Access to...

Data underlying the publication: ‘Effects of E. coli Nissle 1917 on the...

Jornada Experimental Range (USDA-ARS) monthly stocking data and pasture...

Real-World Distribution Network and Loading Data

Data underlying the research: Identification of potential hub genes and...

Medium articles dataset

Medium articles dataset in JSON format

Medium Articles dataset