100+ datasets found
  1. Dataset #2: Experimental study

    • figshare.com
    docx
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Baimel (2023). Dataset #2: Experimental study [Dataset]. http://doi.org/10.6084/m9.figshare.23708766.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Adam Baimel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Project Title: Add title here

    Project Team: Add contact information for research project team members

    Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

    Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

    Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

    Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

    Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

    Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

    List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

    Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

    Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

  2. YouTube Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jan 9, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    YouTube, Worldwide
    Description

    Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.

  3. r

    Add GTFS to a Network Dataset

    • opendata.rcmrd.org
    Updated Jun 27, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS for Transportation Analytics (2013). Add GTFS to a Network Dataset [Dataset]. https://opendata.rcmrd.org/content/0fa52a75d9ba4abcad6b88bb6285fae1
    Explore at:
    Dataset updated
    Jun 27, 2013
    Dataset authored and provided by
    ArcGIS for Transportation Analytics
    Description

    Deprecation notice: This tool is deprecated because this functionality is now available with out-of-the-box tools in ArcGIS Pro. The tool author will no longer be making further enhancements or fixing major bugs.Use Add GTFS to a Network Dataset to incorporate transit data into a network dataset so you can perform schedule-aware analyses using the Network Analyst tools in ArcMap.After creating your network dataset, you can use the ArcGIS Network Analyst tools, like Service Area and OD Cost Matrix, to perform transit/pedestrian accessibility analyses, make decisions about where to locate new facilities, find populations underserved by transit or particular types of facilities, or visualize the areas reachable from your business at different times of day. You can also publish services in ArcGIS Server that use your network dataset.The Add GTFS to a Network Dataset tool suite consists of a toolbox to pre-process the GTFS data to prepare it for use in the network dataset and a custom GTFS transit evaluator you must install that helps the network dataset read the GTFS schedules. A user's guide is included to help you set up your network dataset and run analyses.Instructions:Download the tool. It will be a zip file.Unzip the file and put it in a permanent location on your machine where you won't lose it. Do not save the unzipped tool folder on a network drive, the Desktop, or any other special reserved Windows folders (like C:\Program Files) because this could cause problems later.The unzipped file contains an installer, AddGTFStoaNetworkDataset_Installer.exe. Double-click this to run it. The installation should proceed quickly, and it should say "Completed" when finished.Read the User's Guide for instructions on creating and using your network dataset.System requirements:ArcMap 10.1 or higher with a Desktop Standard (ArcEditor) license. (You can still use it if you have a Desktop Basic license, but you will have to find an alternate method for one of the pre-processing tools.) ArcMap 10.6 or higher is recommended because you will be able to construct your network dataset much more easily using a template rather than having to do it manually step by step. This tool does not work in ArcGIS Pro. See the User's Guide for more information.Network Analyst extensionThe necessary permissions to install something on your computer.Data requirements:Street data for the area covered by your transit system, preferably data including pedestrian attributes. If you need help preparing high-quality street data for your network, please review this tutorial.A valid GTFS dataset. If your GTFS dataset has blank values for arrival_time and departure_time in stop_times.txt, you will not be able to run this tool. You can download and use the Interpolate Blank Stop Times tool to estimate blank arrival_time and departure_time values for your dataset if you still want to use it.Help forum

  4. h

    dataset-card-example

    • huggingface.co
    Updated Sep 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Templates (2023). dataset-card-example [Dataset]. https://huggingface.co/datasets/templates/dataset-card-example
    Explore at:
    Dataset updated
    Sep 28, 2023
    Dataset authored and provided by
    Templates
    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
    
    
      Dataset Description
    

    Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

      Dataset Sources [optional]… See the full description on the dataset page: https://huggingface.co/datasets/templates/dataset-card-example.
    
  5. H

    Dataset metadata of known Dataverse installations, August 2023

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Gautier (2024). Dataset metadata of known Dataverse installations, August 2023 [Dataset]. http://doi.org/10.7910/DVN/8FEGUV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Julian Gautier
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the metadata of the datasets published in 85 Dataverse installations and information about each installation's metadata blocks. It also includes the lists of pre-defined licenses or terms of use that dataset depositors can apply to the datasets they publish in the 58 installations that were running versions of the Dataverse software that include that feature. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations and improving understandings about how certain Dataverse features and metadata fields are used. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 22 and August 28, 2023 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another column named "apikey" listing my accounts' API tokens. The Python script expects the CSV file and the listed API tokens to get metadata and other information from installations that require API tokens. How the files are organized β”œβ”€β”€ csv_files_with_metadata_from_most_known_dataverse_installations β”‚ β”œβ”€β”€ author(citation)_2023.08.22-2023.08.28.csv β”‚ β”œβ”€β”€ contributor(citation)_2023.08.22-2023.08.28.csv β”‚ β”œβ”€β”€ data_source(citation)_2023.08.22-2023.08.28.csv β”‚ β”œβ”€β”€ ... β”‚ └── topic_classification(citation)_2023.08.22-2023.08.28.csv β”œβ”€β”€ dataverse_json_metadata_from_each_known_dataverse_installation β”‚ β”œβ”€β”€ Abacus_2023.08.27_12.59.59.zip β”‚ β”œβ”€β”€ dataset_pids_Abacus_2023.08.27_12.59.59.csv β”‚ β”œβ”€β”€ Dataverse_JSON_metadata_2023.08.27_12.59.59 β”‚ β”œβ”€β”€ hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json β”‚ β”œβ”€β”€ ... β”‚ β”œβ”€β”€ metadatablocks_v5.6 β”‚ β”œβ”€β”€ astrophysics_v5.6.json β”‚ β”œβ”€β”€ biomedical_v5.6.json β”‚ β”œβ”€β”€ citation_v5.6.json β”‚ β”œβ”€β”€ ... β”‚ β”œβ”€β”€ socialscience_v5.6.json β”‚ β”œβ”€β”€ ACSS_Dataverse_2023.08.26_22.14.04.zip β”‚ β”œβ”€β”€ ADA_Dataverse_2023.08.27_13.16.20.zip β”‚ β”œβ”€β”€ Arca_Dados_2023.08.27_13.34.09.zip β”‚ β”œβ”€β”€ ... β”‚ └── World_Agroforestry_-_Research_Data_Repository_2023.08.27_19.24.15.zip └── dataverse_installations_summary_2023.08.28.csv └── dataset_pids_from_most_known_dataverse_installations_2023.08.csv └── license_options_for_each_dataverse_installation_2023.09.05.csv └── metadatablocks_from_most_known_dataverse_installations_2023.09.05.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the citation metadata block and geospatial metadata block of datasets in the 85 Dataverse installations. For example, author(citation)_2023.08.22-2023.08.28.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in the 85 installations, where there's a row for author names, affiliations, identifier types and identifiers. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 85 zipped files, one for each of the 85 Dataverse installations whose dataset metadata I was able to download. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. It also includes the alias/identifier and category of the Dataverse collection that the dataset is in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The Dataverse JSON export of the latest version of each dataset includes "(latest_version)" in the file name. This should help those who are interested in the metadata of only the latest version of each dataset. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I included them so that they can be used when extracting metadata from the dataset's Dataverse JSON exports. The dataverse_installations_summary_2023.08.28.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata...

  6. b

    Sports Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Sports Dataset [Dataset]. https://brightdata.com/products/datasets/sports
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    Bright Data
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    We will create a customized sports dataset tailored to your specific requirements. Data points may include player statistics, team rankings, game scores, player contracts, and other relevant metrics.

    Utilize our sports datasets for a variety of applications to boost strategic planning and performance analysis. Analyzing these datasets can help organizations understand player performance and market trends within the sports industry, allowing for more precise team management and marketing strategies. You can choose to access the complete dataset or a customized subset based on your business needs.

    Popular use cases include: enhancing player performance analysis, refining team strategies, and optimizing fan engagement efforts.

  7. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  8. u

    Add-on Data for the ICOADS Value-Added Database (IVAD)

    • rda.ucar.edu
    • rda-web-prod.ucar.edu
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Add-on Data for the ICOADS Value-Added Database (IVAD) [Dataset]. https://rda.ucar.edu/lookfordata/datasets/?nb=y&b=topic&v=Oceans
    Explore at:
    Description

    This dataset contains original community-based contributions that are added onto ICOADS records as part of the ICOADS Value-added Database project. These contributions were first added to ICOADS beginning with Release ... 3.0.0 and are realized as separate attachments in the IMMA format, e.g. reanalysis feed back and QC, bias adjustment for wind and SST, and etc. In general, these data are NOT recommended for the average data user.

  9. Z

    #PraCegoVer dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esther Luna Colombini (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
    Explore at:
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Esther Luna Colombini
    Sandra Avila
    Gabriel Oliveira dos Santos
    Description

    Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    Dataset Structure

    PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

    containing the images. The file dataset.json comprehends a list of json objects with the attributes:

    user: anonymized user that made the post;

    filename: image file name;

    raw_caption: raw caption;

    caption: clean caption;

    date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

  10. d

    A Dataset for Machine Learning Algorithm Development

    • catalog.data.gov
    • fisheries.noaa.gov
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2024). A Dataset for Machine Learning Algorithm Development [Dataset]. https://catalog.data.gov/dataset/a-dataset-for-machine-learning-algorithm-development2
    Explore at:
    Dataset updated
    May 1, 2024
    Dataset provided by
    (Point of Contact, Custodian)
    Description

    This dataset consists of imagery, imagery footprints, associated ice seal detections and homography files associated with the KAMERA Test Flights conducted in 2019. This dataset was subset to include relevant data for detection algorithm development. This dataset is limited to data collected during flights 4, 5, 6 and 7 from our 2019 surveys.

  11. n

    FOI 29916 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). FOI 29916 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-29916
    Explore at:
    Dataset updated
    Jan 3, 2023
    Description

    CSVs with more than 1 million rows can be viewed using add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets. The Microsoft PowerPivot add-on for Excel is available using the link in the 'Related Links' section below. Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. Start Excel as normal Click on the PowerPivot tab Click on the PowerPivot Window icon (top left) In the PowerPivot Window, click on the "From Other Sources" icon

  12. u

    PDMX

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.

  13. R

    Set Add Dataset

    • universe.roboflow.com
    zip
    Updated Feb 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    set add (2025). Set Add Dataset [Dataset]. https://universe.roboflow.com/set-add/set-add
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    set add
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Viable Bounding Boxes
    Description

    Set Add

    ## Overview
    
    Set Add is a dataset for object detection tasks - it contains Viable annotations for 723 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. NIST Statistical Reference Datasets - SRD 140

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jul 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). NIST Statistical Reference Datasets - SRD 140 [Dataset]. https://catalog.data.gov/dataset/nist-statistical-reference-datasets-srd-140-df30c
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

  15. VegeNet - Image datasets and Codes

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jo Yen Tan; Jo Yen Tan (2022). VegeNet - Image datasets and Codes [Dataset]. http://doi.org/10.5281/zenodo.7254508
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jo Yen Tan; Jo Yen Tan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Compilation of python codes for data preprocessing and VegeNet building, as well as image datasets (zip files).

    Image datasets:

    1. vege_original : Images of vegetables captured manually in data acquisition stage
    2. vege_cropped_renamed : Images in (1) cropped to remove background areas and image labels renamed
    3. non-vege images : Images of non-vegetable foods for CNN network to recognize other-than-vegetable foods
    4. food_image_dataset : Complete set of vege (2) and non-vege (3) images for architecture building.
    5. food_image_dataset_split : Image dataset (4) split into train and test sets
    6. process : Images created when cropping (pre-processing step) to create dataset (2).
  16. m

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • bridges.monash.edu
    • researchdata.edu.au
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg (2023). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    PublicationPrimahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387Description of R codes and data files in the repositoryThis repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  17. R

    New Backpacks Dataset

    • universe.roboflow.com
    zip
    Updated Jul 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Common Objects (2022). New Backpacks Dataset [Dataset]. https://universe.roboflow.com/common-objects/new-backpacks/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 20, 2022
    Dataset authored and provided by
    Common Objects
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Backpacks Bounding Boxes
    Description

    New Backpacks

    ## Overview
    
    New Backpacks is a dataset for object detection tasks - it contains Backpacks annotations for 1,047 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  18. Industrial Dataset

    • kaggle.com
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Be Schue (2023). Industrial Dataset [Dataset]. https://www.kaggle.com/datasets/beschue/industrial-classification-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Be Schue
    Description

    The dataset includes 10 object categories from the MVTEC INDUSTRIAL 3D OBJECT DETECTION DATASET as input CAD objects. The selected objects include a diverse range of industrial products:

    S.NoObject Class
    1adapter plate triangular
    2bracket big
    3clamp small
    4engine part cooler round
    5engine part cooler square
    6injection pump
    7screw
    8star
    9tee connector
    10thread

    The dataset contains a total of 100,000 RGB images of each object category, divided into three sets: 70,000 for training, 20,000 for testing, and 10,000 for validation. Each image has a resolution of 224 x 224 and is in JPEG format.

    To ensure the suitability of our dataset for various computer vision tasks, we included not only the class labels but also generated bounding boxes and semantic masks for each image, which are stored in COCO annotation format. Each image contains one instance of the ten selected objects.

    Throughout the 10,000 images for each class, we randomly varied the position of the object in x-y-z direction and the object’s rotation to provide a diverse range of images. Additionally, we changed the object’s surface to a smooth metallic texture, imitating real industrial components. Lastly, we varied the lighting conditions within each image, including the position of the light sources, their energy, and emission strength.

    Find out more about our Data Generation Tool:

    Schuerrle, B., Sankarappan, V., & Morozov, A. (2023). SynthiCAD: Generation of Industrial Image Data Sets for Resilience Evaluation of Safety-Critical Classifiers. In Proceeding of the 33rd European Safety and Reliability Conference. 33rd European Safety and Reliability Conference. Research Publishing Services. https://doi.org/10.3850/978-981-18-8071-1_p400-cd

  19. F

    English Closed Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-closed-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    The English Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the English language, advancing the field of artificial intelligence.

    Dataset Content:

    This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in English. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native English people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled English Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    The English versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy English Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.

  20. P

    arXiv-10 Dataset

    • paperswithcode.com
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashkan Farhangi; Ning Sui; Nan Hua; Haiyan Bai; Arthur Huang; Zhishan Guo (2024). arXiv-10 Dataset [Dataset]. https://paperswithcode.com/dataset/arxiv-10
    Explore at:
    Dataset updated
    Nov 11, 2024
    Authors
    Ashkan Farhangi; Ning Sui; Nan Hua; Haiyan Bai; Arthur Huang; Zhishan Guo
    Description

    Benchmark dataset for abstracts and titles of 100,000 ArXiv scientific papers. This dataset contains 10 classes and is balanced (exactly 10,000 per class). The classes include subcategories of computer science, physics, and math.

    β€’ Direct link: Download

    β€’ Citation: @inproceedings{farhangi2022protoformer, title={Protoformer: Embedding Prototypes for Transformers}, author={Farhangi, Ashkan and Sui, Ning and Hua, Nan and Bai, Haiyan and Huang, Arthur and Guo, Zhishan}, booktitle={Advances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16--19, 2022, Proceedings, Part I}, pages={447--458}, year={2022} }

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Adam Baimel (2023). Dataset #2: Experimental study [Dataset]. http://doi.org/10.6084/m9.figshare.23708766.v1
Organization logoOrganization logo

Dataset #2: Experimental study

Explore at:
docxAvailable download formats
Dataset updated
Jul 19, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Adam Baimel
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Project Title: Add title here

Project Team: Add contact information for research project team members

Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

Search
Clear search
Close search
Google apps
Main menu