19 datasets found

Data from: Inventory of online public databases and repositories holding...
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Dataset used for detecting DNS over HTTPS by Machine Learning.
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitrii Vekshin; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Dmitrii Vekshin (2020). Dataset used for detecting DNS over HTTPS by Machine Learning. [Dataset]. http://doi.org/10.5281/zenodo.3906526
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3906526
Dataset updated
Oct 28, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dmitrii Vekshin; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Dmitrii Vekshin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset consists of three different data sources:

DoH enabled Firefox

DoH enabled Google Chrome

Cloudflared DoH proxy

The capture of web browser data was made using the Selenium framework, which simulated classical user browsing. The browsers received command for visiting domains taken from Alexa's top 10K most visited websites. The capturing was performed on the host by listening to the network interface of the virtual machine. Overall the dataset contains almost 5,000 web-page visits by Mozilla and 1,000 pages visited by Chrome.

The Cloudflared DoH proxy was installed in Raspberry PI, and the IP address of the Raspberry was set as the default DNS resolver in two separate offices in our university. It was continuously capturing the DNS/DoH traffic created up to 20 devices for around three months.

The dataset contains 1,128,904 flows from which is around 33,000 labeled as DoH. We provide raw pcap data, CSV with flow data, and CSV file with extracted features.

The CSV with extracted features has the following data fields:

- Label (1 - Doh, 0 - regular HTTPS)
- Data source
- Duration
- Minimal Inter-Packet Delay
- Maximal Inter-Packet Delay
- Average Inter-Packet Delay
- A variance of Incoming Packet Sizes
- A variance of Outgoing Packet Sizes
- A ratio of the number of Incoming and outgoing bytes
- A ration of the number of Incoming and outgoing packets
- Average of Incoming Packet sizes
- Average of Outgoing Packet sizes
- The median value of Incoming Packet sizes
- The median value of outgoing Packet sizes
- The ratio of bursts and pauses
- Number of bursts
- Number of pauses
- Autocorrelation
- Transmission symmetry in the 1st third of connection
- Transmission symmetry in the 2nd third of connection
- Transmission symmetry in the last third of connection

The observed network traffic does not contain privacy-sensitive information.

The zip file structure is:

|-- data | |-- extracted-features...extracted features used in ML for DoH recognition | | |-- chrome | | |-- cloudflared | | `-- firefox | |-- flows...............................................exported flow data | | |-- chrome | | |-- cloudflared | | `-- firefox | `-- pcaps....................................................raw PCAP data | |-- chrome | |-- cloudflared | `-- firefox |-- LICENSE `-- README.md

When using this dataset, please cite the original work as follows:

@inproceedings{vekshin2020, author = {Vekshin, Dmitrii and Hynek, Karel and Cejka, Tomas}, title = {DoH Insight: Detecting DNS over HTTPS by Machine Learning}, year = {2020}, isbn = {9781450388337}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3407023.3409192}, doi = {10.1145/3407023.3409192}, booktitle = {Proceedings of the 15th International Conference on Availability, Reliability and Security}, articleno = {87}, numpages = {8}, keywords = {classification, DoH, DNS over HTTPS, machine learning, detection, datasets}, location = {Virtual Event, Ireland}, series = {ARES '20} }
Data from: E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects...
zenodo.org
bin, pdf, txt
Updated May 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Di Meglio; Sergio Di Meglio; Valeria Pontillo; Valeria Pontillo; Coen De roover; Coen De roover; Luigi Libero Lucio Starace; Luigi Libero Lucio Starace; Sergio Di Martino; Sergio Di Martino; Ruben Opdebeeck; Ruben Opdebeeck (2025). E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects [Dataset]. http://doi.org/10.5281/zenodo.14988988
Explore at:
txt, bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14988988
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sergio Di Meglio; Sergio Di Meglio; Valeria Pontillo; Valeria Pontillo; Coen De roover; Coen De roover; Luigi Libero Lucio Starace; Luigi Libero Lucio Starace; Sergio Di Martino; Sergio Di Martino; Ruben Opdebeeck; Ruben Opdebeeck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT
End-to-end (E2E) testing is a software validation approach that simulates realistic user scenarios throughout the entire workflow of an application. In the context of web
applications, E2E testing involves two activities: Graphic User Interface (GUI) testing, which simulates user interactions with the web app’s GUI through web browsers, and performance testing, which evaluates system workload handling. Despite its recognized importance in delivering high-quality web applications, the availability of large-scale datasets featuring real-world E2E web tests remains limited, hindering research in the field.
To address this gap, we present E2EGit, a comprehensive dataset of non-trivial open-source web projects collected on GitHub that adopt E2E testing. By analyzing over 5,000 web repositories across popular programming languages (JAVA, JAVASCRIPT, TYPESCRIPT, and PYTHON), we identified 472 repositories implementing 43,670 automated Web GUI tests with popular browser automation frameworks (SELENIUM, PLAYWRIGHT, CYPRESS, PUPPETEER), and 84 repositories that featured 271 automated performance tests implemented leveraging the most popular open-source tools (JMETER, LOCUST). Among these, 13 repositories implemented both types of testing for a total of 786 Web GUI tests and 61 performance tests.

DATASET DESCRIPTION
The dataset is provided as an SQLite database, whose structure is illustrated in Figure 3 (in the paper), which consists of five tables, each serving a specific purpose.
The repository table contains information on 1.5 million repositories collected using the SEART tool on May 4. It includes 34 fields detailing repository characteristics. The
non_trivial_repository table is a subset of the previous one, listing repositories that passed the two filtering stages described in the pipeline. For each repository, it specifies whether it is a web repository using JAVA, JAVASCRIPT, TYPESCRIPT, or PYTHON frameworks. A repository may use multiple frameworks, with corresponding fields (e.g., is web java) set to true, and the field web dependencies listing the detected web frameworks. For Web GUI testing, the dataset includes two additional tables; gui_testing_test _details, where each row represents a test file, providing the file path, the browser automation framework used, the test engine employed, and the number of tests implemented in the file. gui_testing_repo_details, aggregating data from the previous table at the repository level. Each of the 472 repositories has a row summarizing
the number of test files using frameworks like SELENIUM or PLAYWRIGHT, test engines like JUNIT, and the total number of tests identified. For performance testing, the performance_testing_test_details table contains 410 rows, one for each test identified. Each row includes the file path, whether the test uses JMETER or LOCUST, and extracted details such as the number of thread groups, concurrent users, and requests. Notably, some fields may be absent—for instance, if external files (e.g., CSVs defining workloads) were unavailable, or in the case of Locust tests, where parameters like duration and concurrent users are specified via the command line.

To cite this article refer to this citation:

@inproceedings{di2025e2egit,
title={E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects},
author={Di Meglio, Sergio and Starace, Luigi Libero Lucio and Pontillo, Valeria and Opdebeeck, Ruben and De Roover, Coen and Di Martino, Sergio},
booktitle={2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)},
pages={10--15},
year={2025},
organization={IEEE/ACM}
}

This work has been partially supported by the Italian PNRR MUR project PE0000013-FAIR.

Countries with the most Facebook users 2024

statista.com
tokrwards.com
+4more

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Countries with the most Facebook users 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

Which county has the most Facebook users?

              There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.

              Facebook – the most used social media

              Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.

              Facebook usage by device
              As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.

data.gov.uk usage statistics - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Nov 13, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2012). data.gov.uk usage statistics - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/data-gov-uk-usage-statistics
Explore at:
Dataset updated
Nov 13, 2012
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Usage data for data.gov.uk. Gives an impression of the quality and quantity of usage, the browsers used and which pages had the most interest. Data from Google Analytics. Updated daily.
Data from: E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects...
zenodo.org
bin, txt
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Di Meglio; Sergio Di Meglio; Valeria Pontillo; Valeria Pontillo; Coen De roover; Coen De roover; Luigi Libero Lucio Starace; Luigi Libero Lucio Starace; Sergio Di Martino; Sergio Di Martino; Ruben Opdebeeck; Ruben Opdebeeck (2025). E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects [Dataset]. http://doi.org/10.5281/zenodo.14221860
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14221860
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sergio Di Meglio; Sergio Di Meglio; Valeria Pontillo; Valeria Pontillo; Coen De roover; Coen De roover; Luigi Libero Lucio Starace; Luigi Libero Lucio Starace; Sergio Di Martino; Sergio Di Martino; Ruben Opdebeeck; Ruben Opdebeeck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT
End-to-End (E2E) testing is a comprehensive approach to validating the functionality of a software application by testing its entire workflow from the user’s perspective, ensuring that all integrated components work together as expected. It is crucial for ensuring the quality and reliability of applications, especially in the web domain, which is often bound by Service Level Agreements (SLAs). This testing involves two key activities:
Graphical User Interface (GUI) testing, which simulates user interactions through browsers, and performance testing, which evaluates system workload handling. Despite its importance, E2E testing is often neglected, and the lack of reliable datasets for Web GUI and performance testing has slowed research progress. This paper addresses these limitations by constructing E2EGit, a comprehensive dataset, cataloging non-trivial open-source web projects on GITHUB that adopt GUI or performance testing.
The dataset construction process involved analyzing over 5k non-trivial web repositories based on popular programming languages (JAVA, JAVASCRIPT TYPESCRIPT PYTHON) to identify: 1) GUI tests based on popular browser automation frameworks (SELENIUM PLAYWRIGHT, CYPRESS, PUPPETEER), 2) performance tests written with the most popular open-source tools (JMETER, LOCUST). After analysis, we identified 472 repositories using web GUI testing, with over 43,000 tests, and 84 repositories using performance testing, with 410 tests.

DATASET DESCRIPTION
The dataset is provided as an SQLite database, whose structure is illustrated in Figure 3 (in the paper), which consists of five tables, each serving a specific purpose.
The repository table contains information on 1.5 million repositories collected using the SEART tool on May 4. It includes 34 fields detailing repository characteristics. The
non_trivial_repository table is a subset of the previous one, listing repositories that passed the two filtering stages described in the pipeline. For each repository, it specifies whether it is a web repository using JAVA, JAVASCRIPT, TYPESCRIPT, or PYTHON frameworks. A repository may use multiple frameworks, with corresponding fields (e.g., is web java) set to true, and the field web dependencies listing the detected web frameworks. For Web GUI testing, the dataset includes two additional tables; gui_testing_test _details, where each row represents a test file, providing the file path, the browser automation framework used, the test engine employed, and the number of tests implemented in the file. gui_testing_repo_details, aggregating data from the previous table at the repository level. Each of the 472 repositories has a row summarizing
the number of test files using frameworks like SELENIUM or PLAYWRIGHT, test engines like JUNIT, and the total number of tests identified. For performance testing, the performance_testing_test_details table contains 410 rows, one for each test identified. Each row includes the file path, whether the test uses JMETER or LOCUST, and extracted details such as the number of thread groups, concurrent users, and requests. Notably, some fields may be absent—for instance, if external files (e.g., CSVs defining workloads) were unavailable, or in the case of Locust tests, where parameters like duration and concurrent users are specified via the command line.

To cite this article refer to this citation:

@inproceedings{di2025e2egit,
title={E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects},
author={Di Meglio, Sergio and Starace, Luigi Libero Lucio and Pontillo, Valeria and Opdebeeck, Ruben and De Roover, Coen and Di Martino, Sergio},
booktitle={2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)},
pages={10--15},
year={2025},
organization={IEEE/ACM}
}

This work has been partially supported by the Italian PNRR MUR project PE0000013-FAIR.
Vibrent Clothes Rental Dataset
kaggle.com
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karl Audun Borgersen (2024). Vibrent Clothes Rental Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/9334353
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9334353
Dataset updated
Sep 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Karl Audun Borgersen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vibrent Clothes Rental Dataset

For any questions about the dataset or requests for more information, please open a discussion or contact the primary author at karl.audun.borgersen@uia.no

Update: Now includes pricing data for outfits, and subscription plans

Notice regarding the images folder:

To ensure these could be uploaded to Kaggle, the images had to be heavily compressed. They can be found in their original quality at: https://console.cloud.google.com/storage/browser/clothes-rental-dataset/images

14 of the original images were corrupted. These have been replaced by a 1x1 placeholder image.

Reference

As is specified by our license, users are free to adapt and share the dataset however they prefer as long as attribution is provided. To do so please cite our accompanying paper "A Dataset for Adapting Recommender Systems to the Fashion Rental Economy" ~~ Note: While this article has been accepted to RecSys 2024 and the final doi has been provided, the article has not been made publicly available yet. Please contact the primary author if you wish to receive an advanced copy before publication.~~ The article is now live at https://dl.acm.org/doi/10.1145/3640457.3688174!

@inproceedings{vibrentClothesRental, address = {Bari Italy}, title = {A {Dataset} for {Adapting} {Recommender} {Systems} to the {Fashion} {Rental} {Economy}}, isbn = {9798400705052}, url = {https://dl.acm.org/doi/10.1145/3640457.3688174}, doi = {10.1145/3640457.3688174}, booktitle = {18th {ACM} {Conference} on {Recommender} {Systems}}, publisher = {ACM}, author = {Borgersen, Karl Audun Kagnes and Goodwin, Morten and Grundetjern, Morten and Sharma, Jivitesh}, month = oct, year = {2024}, pages = {945--950}, }

Addendums to descriptions

A description of each column can be seen in the dataset viewer below. This section will include some addendums to those descriptions.

General transactions

All experiments listed in the referenced paper concatenate the data from user_activity_triplets.csv and additional_tabular_data/original_orders.csv

Outfits

Outfit groups: All outfits that share the same group are the same kind of outfit. e.g. if the outfit is a red cocktail dress, then all outfits with the same groups are different copies of the same cocktail dress. These different copies often vary in outfit size.

Descriptions: While most of these are high-quality descriptions, some are written informally, missing, or in Norwegian. There are around 200 descriptions in Norwegian in total.

Third Chance

Many of the owners are referred to as "FJONG", this was Vibrent's original name.
Data from: CottonGen: Cotton Database Resources
catalog.data.gov
datasetcatalog.nlm.nih.gov
+1more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). CottonGen: Cotton Database Resources [Dataset]. https://catalog.data.gov/dataset/cottongen-cotton-database-resources-151bf
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
CottonGen (https://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data to enable basic, translational and applied research in cotton. Built using the open-source Tripal database infrastructure, CottonGen supersedes CottonDB and the Cotton Marker Database, which includes sequences, genetic and physical maps, genotypic and phenotypic markers and polymorphisms, quantitative trait loci (QTLs), pathogens, germplasm collections and trait evaluations, pedigrees, and relevant bibliographic citations, with enhanced tools for easier data sharing, mining, visualization, and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. This project is funded/supported by Cotton Incorporated, the USDA-ARS Crop Germplasm Research Unit at College Station, TX, the Southern Association of Agricultural Experiment Station Directors, Bayer CropScience, Corteva/Agriscience, Dow/Phytogen, Monsanto, Washington State University, and NRSP10. Resources in this dataset:Resource Title: Website Pointer for CottonGen. File Name: Web Page, url: https://www.cottongen.org/ Genomic, Genetic and Breeding Resources for Cotton Research Discovery and Crop Improvement organized by : Species (Gossypium arboreum, barbadense, herbaceum, hirsutum, raimondii, others), Data (Contributors, Download, Submission, Community Projects, Archives, Cotton Trait Ontology, Nomenclatures, and links to Variety Testing Data and NCBISRA Datasets), Search options (Colleague, Genes and Transcripts, Genotype, Germplasm, Map, Markers, Publications, QTLs, Sequences, Trait Evaluation, MegaSearch), Tools (BIMS, BLAST+, CottonCyc, JBrowse, Map Viewer, Primer3, Sequence Retrieval, Synteny Viewer), International Cotton Genome Initiative (ICGI), and Help sources (User manual, FAQs). Also provides Quick Start links for Major Species and Tools.
Microsoft Coco Dataset
universe.roboflow.com
zip
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2025). Microsoft Coco Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco/model/3
Explore at:
zipAvailable download formats
Dataset updated
Jul 23, 2025
Dataset authored and provided by
Microsofthttp://microsoft.com/
Variables measured
Object Bounding Boxes
Description
Microsoft Common Objects in Context (COCO) Dataset

The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.

While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.

The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.

The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:

An introduction to the COCO dataset

Weird images in COCO, and what that tells us about the utility and limits of COCO
i
Evolution of Web search engine interfaces through SERP screenshots and HTML...
rdm.inesctec.pt
Updated Jul 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2021-003
Explore at:
Dataset updated
Jul 26, 2021
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. We used the most searched queries by year to extract a representative sample of SERP from the Internet Archive. The Internet Archive has been keeping snapshots and the respective HTML version of webpages over time and tts collection contains more than 50 billion webpages. We used Python and Selenium Webdriver, for browser automation, to visit each capture online, check if the capture is valid, save the HTML version, and generate a full screenshot. The dataset contains all the extracted captures. Each capture is represented by a screenshot, an HTML file, and a files' folder. We concatenate the initial of the search engine (G) with the capture's timestamp for file naming. The filename ends with a sequential integer "-N" if the timestamp is repeated. For example, "G20070330145203-1" identifies a second capture from Google by March 30, 2007. The first is identified by "G20070330145203". Using this dataset, we analyzed how SERP evolved in terms of content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have registered the appearance of SERP features and analyzed the design patterns involved in each SERP component. We found that the number of elements in SERP has been rising over the years, demanding a more extensive interface area and larger files. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of the dataset we provide here. This graphic represents the diversity of captures by year and search engine (Google and Bing).
e
Africa - Electricity Transmission and Distribution Grid Map - Dataset -...
energydata.info
Updated Sep 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Africa - Electricity Transmission and Distribution Grid Map - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/africa-electricity-transmission-and-distribution-grid-map-2017
Explore at:
Dataset updated
Sep 26, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Africa
Description
Note: This dataset has been updated with transmission lines for the MENA region. This is the most complete and up-to-date open map of Africa's electricity grid network. This dataset serves as an updated and improved replacement for the Africa Infrastructure Country Diagnostic (AICD) data that was published in 2007. Coverage This dataset includes planned and existing grid lines for all continental African countries and Madagascar, as well as the Middle East region. The lines range in voltage from sub-kV to 700 kV EHV lines, though there is a very large variation in the completeness of data by country. An interactive tool has been created for exploring this data, the Africa Electricity Grids Explorer. Sources The primary sources for this dataset are as follows: Africa Infrastructure Country Diagnostic (AICD) OSM © OpenStreetMap contributors For MENA: Arab Union of Electricity and country utilities. For West Africa: West African Power Pool (WAPP) GIS database World Bank projects archive and IBRD maps There were many additional sources for specific countries and areas. This information is contained in the files of this dataset, and can also be found by browsing the individual country datasets, which contain more extensive information. Limitations Some of the data, notably that from the AICD and from World Bank project archives, may be very out of date. Where possible this has been improved with data from other sources, but in many cases this wasn't possible. This varies significantly from country to country, depending on data availability. Thus, many new lines may exist which aren't shown, and planned lines may have completely changed or already been constructed. The data that comes from World Bank project archives has been digitized from PDF maps. This means that these lines should serve as an indication of extent and general location, but shouldn't be used for precisely location grid lines.
d
Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...
datarade.ai
.csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori, Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori
Explore at:
.csvAvailable download formats
Dataset authored and provided by
Factori
Area covered
Uzbekistan, Egypt, Faroe Islands, Turks and Caicos Islands, Austria, Japan, Taiwan, Cameroon, Palestine, Sweden
Description
Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts
c
ckanext-datatablesview
catalog.civicdataecosystem.org
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-datatablesview [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-datatablesview
Explore at:
Dataset updated
Jun 4, 2025
Description
The datatablesview extension for CKAN enhances the display of tabular datasets within CKAN by integrating the DataTables JavaScript library. As a fork of a previous DataTables CKAN plugin, this extension aims to provide improved functionality and maintainability for presenting data in a user-friendly and interactive tabular format. This tool focuses on making data more accessible and easier to explore directly within the CKAN interface. Key Features: Enhanced Data Visualization: Transforms standard CKAN dataset views into interactive tables using the DataTables library, providing a more engaging user experience compared to plain HTML tables. Interactive Table Functionality: Includes features such as sorting, filtering, and pagination within the data table, allowing users to easily navigate and analyze large datasets directly in the browser. Improved Data Accessibility: Makes tabular data more accessible to a wider range of users by providing intuitive tools to explore and understand the information. Presumed Customizable Appearance: Given that it is based on DataTables, users will likely be able to customize the look and feel of the tables through DataTables configuration options (note: this is an assumption based on standard DataTables usage and may require coding). Use Cases (based on typical DataTables applications): Government Data Portals: Display complex government datasets in a format that is easy for citizens to search, filter, and understand, enhancing transparency and promoting data-driven decision-making. For example, presenting financial data, population statistics, or environmental monitoring results. Research Data Repositories: Allow researchers to quickly explore and analyze large scientific datasets directly within the CKAN interface, facilitating data discovery and collaboration. Corporate Data Catalogs: Enable business users to easily access and manipulate tabular data relevant to their roles, improving data literacy and enabling data-informed business strategies. Technical Integration (inferred from CKAN extension structure): The extension likely operates by leveraging CKAN's plugin architecture to override the default dataset view for tabular data. Its implementation likely uses CKAN's templating system to render datasets using DataTables' JavaScript and CSS, enhancing data-viewing experience. Benefits & Impact: By implementing the datatablesview extension, organizations can improve the user experience when accessing and exploring tabular datasets within their CKAN instances. The enhanced interactivity and data exploration features can lead to increased data utilization, improved data literacy, and more effective data-driven decision-making within organizations and communities.
e
Verification benchmarks for single-phase flow in three-dimensional fractured...
b2find.eudat.eu
Updated Oct 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Verification benchmarks for single-phase flow in three-dimensional fractured porous media: DuMuX source code - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c57241f5-1ad0-5049-bb96-39edbd83aab3
Explore at:
Dataset updated
Oct 13, 2023
Description
This dataset contains the source code for simulating the benchmark cases of Berre et al. (2021) with the open-source simulator DuMuX. The benchmarks focus on flow and transport through fractured porous media, considering fracture networks of varying complexity. The code in this dataset can be used, for instance, to reproduce the results published at DaRUS in the sub-folder ustutt-mpfa/vtk. This dataset provides multi-modal data around the software. Besides the source code (berre2020.tar.gz), a Dockerfile, a docker image and computation templates for convenient reproduction of the results are contained within this dataset. For more information on how to install and use the code or docker images, see the file README.md. To trigger the execution of the computation templates on ViPLab, click on the badge below or select the ViPLab option behind the Access Dataset button. The code allows running all benchmark cases with all numerical schemes available in DuMuX, however, the computation template for case 4 does not expose the MPFA-O scheme as this requires more computational resources than feasible for an exploration in the browser. Use persistent identifiers from Software Heritage ( ) to cite individual files or even lines of the source code.
Ward Profiles and Atlas - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Mar 23, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2017). Ward Profiles and Atlas - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/ward-profiles-and-atlas
Explore at:
Dataset updated
Mar 23, 2017
Dataset provided by
CKANhttps://ckan.org/
Description
The ward profiles and ward atlas provide a range of demographic and related data for each ward in Greater London. They are designed to provide an overview of the population in these small areas by presenting a range of data on the population, diversity, households, life expectancy, housing, crime, benefits, land use, deprivation, and employment. Indicators included here are population by age and sex, land area, projections, population density, household composition, religion, ethnicity, birth rates (general fertility rate), death rates (standardised mortality ratio), life expectancy, average house prices, properties sold, housing by council tax band, tenure, property size (bedrooms), dwelling build period and type, mortgage and landlord home repossession, employment and economic activity, Incapacity Benefit, Housing Benefit, Household income, Income Support and JobSeekers Allowance claimant rates, dependent children receiving child-tax credits by lone parents and out-of-work families, child poverty, National Insurance Number registration rates for overseas nationals (NINo), GCSE results, A-level / Level 3 results (average point scores), pupil absence, child obesity, crime rates (by type of crime), fires, ambulance call outs, road casualties, happiness and well-being, land use, public transport accessibility (PTALs), access to public greenspace, access to nature, air emissions / quality, car use, bicycle travel, Indices of Deprivation, and election turnout. The Ward Profiles present key summary measures for the most recent year, using both Excel and InstantAtlas mapping software. This is a useful tool for displaying a large amount of data for numerous geographies, in one place. The Ward Atlas presents a more detailed version of the data including trend data and generally includes the raw numbers as opposed to percentages or rates. The Instant Atlas reports use HTML5 technology, which can be used in modern browsers, including on Apple machines, but will not function on older browsers. WARD ATLAS FOR 2014 BOUNDARIES In May 2014, ward boundaries changed in Hackney, Kensington and Chelsea, and Tower Hamlets. This version of the ward atlas gives data for these new wards, as well as retaining data on the unchanged wards in the rest of London for comparison purposes. Data for boroughs has also been included. Very few datasets have been published for the new ward boundaries, so the majority of data contained in this atlas have been modelled using a method of proportion of households from the old boundaries that are located in the new boundaries. Therefore, the data contained in this atlas are indicative only. OTHER SMALL AREA PROFILES Other profiles available include Borough, LSOA and MSOA atlases. Data from these profiles were used to create the Well-being scores tool. *The London boroughs are: City of London, Barking and Dagenham, Barnet, Bexley, Brent, Bromley, Camden, Croydon, Ealing, Enfield, Greenwich, Hackney, Hammersmith and Fulham, Haringey, Harrow, Havering, Hillingdon, Hounslow, Islington, Kensington and Chelsea, Kingston upon Thames, Lambeth, Lewisham, Merton, Newham, Redbridge, Richmond upon Thames, Southwark, Sutton, Tower Hamlets, Waltham Forest, Wandsworth, Westminster. These profiles were created using the most up to date information available at the time of collection (September 2015).
h
playwright-mcp-toolcalling
huggingface.co
Updated Jul 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
justin albrethsen (2025). playwright-mcp-toolcalling [Dataset]. https://huggingface.co/datasets/jdaddyalbs/playwright-mcp-toolcalling
Explore at:
Dataset updated
Jul 25, 2025
Authors
justin albrethsen
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Purpose

I wanted to train a small agent to use a browser effectively, most smaller models I tried <32b struggled to call the tools correctly. I created this dataset for two main reasons:

To help with finetuning smaller models to use the browser specific tools in playwright. To look at the security implications of giving browser access to untrusted open-weight models, see blog post.

Versions

I am ironing out the kinks, but I will leave the older versions here in… See the full description on the dataset page: https://huggingface.co/datasets/jdaddyalbs/playwright-mcp-toolcalling.
T
universal_dependencies
tensorflow.org
opendatalab.com
+1more
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). universal_dependencies [Dataset]. https://www.tensorflow.org/datasets/catalog/universal_dependencies
Explore at:
Dataset updated
Dec 6, 2022
Description
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing more than 200 treebanks in over 100 languages. If you’re new to UD, you should start by reading the first part of the Short Introduction and then browsing the annotation guidelines.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('universal_dependencies', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
t
Download service NUMIS - Vdataset - LDM
service.tib.eu
Updated Feb 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Download service NUMIS - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/govdata_0b70f6f7-5500-418a-8eee-b43242dce7ac
Explore at:
Dataset updated
Feb 4, 2025
Description
The service can be used to download spatial data sets from the division of the Lower Saxony Ministry of Environment, Energy, Building and Climate Protection. The implementation is carried out via atomic feeds according to INSPIRE specification. Zip archives are provided by Shapefiles. Here you can go directly to the service: https://numis.niedersachsen.de/daten/DE-NI-MU_Downloadservice.xml Note: For more recent versions of the common web browsers, support for displaying ATOM feeds has been removed. This may cause the browsers to display hard-to-read XML or to open a download popup window. In these cases, a browser addon must be installed to display the Atom feed. To view the data in your web browser, please open the NUMIS ATOM feed client (see below under “More References”). Explanation of the subject reference: Implementation based on the Technical Guidance for INSPIRE Download Services 3.0 — Chapter 5. Atomic Implementation of Pre-defined Dataset Download Service" from 12.06.2012.
c
Sociodemographics - United States of America (Public Use Microdata Area,...
carto.com
Updated Mar 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
American Community Survey (2021). Sociodemographics - United States of America (Public Use Microdata Area, 2011, 5yrs) [Dataset]. https://carto.com/spatial-data-catalog/browser/dataset/acs_sociodemogr_7c9201f0/
Explore at:
Dataset updated
Mar 29, 2021
Dataset authored and provided by
American Community Survey
Area covered
United States
Description
The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about the USA and its people. This dataset contains only a subset of the variables that have been deemed most relevant. More info: https://www.census.gov/programs-surveys/acs/about.html
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81

Data from: Inventory of online public databases and repositories holding agricultural data in 2017

Explore at:

Dataset updated

Apr 21, 2025

Dataset provided by

Agricultural Research Servicehttps://www.ars.usda.gov/

Description

United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

Clear search

Close search

Google apps

Main menu

Data from: Inventory of online public databases and repositories holding...

Dataset used for detecting DNS over HTTPS by Machine Learning.

Data from: E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects...

Countries with the most Facebook users 2024

data.gov.uk usage statistics - Dataset - data.gov.uk

Data from: E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects...

Vibrent Clothes Rental Dataset

Vibrent Clothes Rental Dataset

Update: Now includes pricing data for outfits, and subscription plans

Notice regarding the images folder:

Reference

Addendums to descriptions

General transactions

Outfits

Third Chance

Data from: CottonGen: Cotton Database Resources

Microsoft Coco Dataset

Microsoft Common Objects in Context (COCO) Dataset

Evolution of Web search engine interfaces through SERP screenshots and HTML...

Africa - Electricity Transmission and Distribution Grid Map - Dataset -...

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...

ckanext-datatablesview

Verification benchmarks for single-phase flow in three-dimensional fractured...

Ward Profiles and Atlas - Dataset - data.gov.uk

playwright-mcp-toolcalling

universal_dependencies

Download service NUMIS - Vdataset - LDM

Sociodemographics - United States of America (Public Use Microdata Area,...

Data from: Inventory of online public databases and repositories holding agricultural data in 2017See More Versions

Data from: Inventory of online public databases and repositories holding agricultural data in 2017