100+ datasets found
  1. h

    example-generate-preference-dataset

    • huggingface.co
    Updated Aug 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    distilabel-internal-testing (2024). example-generate-preference-dataset [Dataset]. https://huggingface.co/datasets/distilabel-internal-testing/example-generate-preference-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2024
    Dataset authored and provided by
    distilabel-internal-testing
    Description

    Dataset Card for example-preference-dataset

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/sdiazlor/example-preference-dataset/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/distilabel-internal-testing/example-generate-preference-dataset.

  2. Dataset example

    • kaggle.com
    zip
    Updated Apr 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Vallejos (2021). Dataset example [Dataset]. https://www.kaggle.com/javiervallejos/dataset-example
    Explore at:
    zip(38691 bytes)Available download formats
    Dataset updated
    Apr 27, 2021
    Authors
    Javier Vallejos
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset was created only for making examples every columns has generated with random values. If you wanna create a dataset similar like this review this notebook

    Content

    There are five columns 'Country' = 'Bolivia', :'Argentina','Paraguay','Chile','Brazil','Peru' 'Temperature' 'Humidity' 'Pm10' 'Date'

  3. i

    Dataset for fuzzzing data generation based on deep advisial learning

    • ieee-dataport.org
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhihui Li (2022). Dataset for fuzzzing data generation based on deep advisial learning [Dataset]. https://ieee-dataport.org/documents/dataset-fuzzzing-data-generation-based-deep-advisial-learning
    Explore at:
    Dataset updated
    May 18, 2022
    Authors
    Zhihui Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was collected from an industrial control system running the Modbus protocol. It is used to train a deep adversarial learning model. This model is used to generate fuzzing data in the same format as the real one. The data is a sequence of hexadecimal numbers. The followed generated data is produced by the already trained model.

  4. R

    Generate Dataset

    • universe.roboflow.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YoloProjectIVS (2025). Generate Dataset [Dataset]. https://universe.roboflow.com/yoloprojectivs/generate-3d288
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 5, 2025
    Dataset authored and provided by
    YoloProjectIVS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects HRDh Bounding Boxes
    Description

    Generate

    ## Overview
    
    Generate is a dataset for object detection tasks - it contains Objects HRDh annotations for 1,172 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  5. The code for generating and processing the dataset for load-displacement and...

    • figshare.com
    txt
    Updated Jan 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kheng Lim Goh (2018). The code for generating and processing the dataset for load-displacement and stress-strain [Dataset]. http://doi.org/10.6084/m9.figshare.5640649.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kheng Lim Goh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The code, strainenergy_v4_1.m, was used for generating and processing the dataset for load-displacement and stress-strain. Software Matlab version 6.1 was used for running the code. The specific variables of the parameters used to generate the current dataset are as follows:• ip1: input file containing the load-displacement data• diameter: fascicle diameter• laststrainpt: an estimate of the strain at rupture, r• orderpoly: an integral value from 2-7 which represents the order of the polynomial for fitting to the data from O to q• loadat1percent: y/n; to determine the value of the load (set at 1% of the maximum load) at which the specimen became taut. ‘y’ denotes yes; ‘n’ denotes no.The logfile.txt, contains the parameters used for deriving the values of the respective mechanical properties.

  6. h

    generate-quiz-dataset

    • huggingface.co
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fauzan Rizky (2024). generate-quiz-dataset [Dataset]. https://huggingface.co/datasets/fauzanrrizky/generate-quiz-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 20, 2024
    Authors
    Fauzan Rizky
    Description

    fauzanrrizky/generate-quiz-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. i

    Random Numbers

    • ieee-dataport.org
    Updated Mar 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Outman (2023). Random Numbers [Dataset]. https://ieee-dataport.org/documents/random-numbers
    Explore at:
    Dataset updated
    Mar 14, 2023
    Authors
    Alexander Outman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes random number generated through various methods.Method 1: shuf https://www.mankier.com/1/shufCommands used to generate dataset files: $ shuf -i 1-1000000000 -n1000000 -o random-shuf.txt$ shuf -i 1-1000000000000 -n1000000 -o random-shuf-1-1000000000000.txt$ jot -r 1000000 1 1000000000000 > random-jot-1-1000000000000.txt

  8. h

    my-dataset-generate

    • huggingface.co
    Updated Jan 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bipul Sharma (2025). my-dataset-generate [Dataset]. https://huggingface.co/datasets/Bipul8765/my-dataset-generate
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2025
    Authors
    Bipul Sharma
    Description

    Dataset Card for my-dataset-generate

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/Bipul8765/my-dataset-generate/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/Bipul8765/my-dataset-generate.

  9. Invoices Dataset

    • kaggle.com
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cankat Saraç (2022). Invoices Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/invoices/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cankat Saraç
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.

  10. i

    CREATE: Multimodal Dataset for Unsupervised Learning and Generative Modeling...

    • ieee-dataport.org
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Brodeur (2025). CREATE: Multimodal Dataset for Unsupervised Learning and Generative Modeling of Sensory Data from a Mobile Robot [Dataset]. https://ieee-dataport.org/open-access/create-multimodal-dataset-unsupervised-learning-and-generative-modeling-sensory-data
    Explore at:
    Dataset updated
    Jun 17, 2025
    Authors
    Simon Brodeur
    Description

    The CREATE database is composed of 14 hours of multimodal recordings from a mobile robotic platform based on the iRobot Create.

  11. f

    Search strings used to generate citation counts for three data sets in WoS,...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 26, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Belter, Christopher W. (2014). Search strings used to generate citation counts for three data sets in WoS, publishers' full text websites, and Google Scholar. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001239723
    Explore at:
    Dataset updated
    Mar 26, 2014
    Authors
    Belter, Christopher W.
    Description

    Search strings used to generate citation counts for three data sets in WoS, publishers' full text websites, and Google Scholar.

  12. S

    The big model fine-tuning data set of five key elements of tourism resources...

    • scidb.cn
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lu bao qing; Chen Min; Wan Fucheng; Yu Hongzhi (2024). The big model fine-tuning data set of five key elements of tourism resources in the five northwestern provinces in 2024 [Dataset]. http://doi.org/10.57760/sciencedb.j00001.01088
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 17, 2024
    Dataset provided by
    Science Data Bank
    Authors
    lu bao qing; Chen Min; Wan Fucheng; Yu Hongzhi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    With the wide application of large models in various fields, the demand for high-quality data sets in the tourism industry is increasing to support the improvement of the model 's ability to understand and generate tourism information. This dataset focuses on textual data in the tourism domain and is designed to support fine-tuning tasks for tourism-oriented large models, aiming to enhance the model's ability to understand and generate tourism-related information. The diversity and quality of the dataset are critical to the model's performance. Therefore, this study combines web scraping and manual annotation techniques, along with data cleaning, denoising, and stopword removal, to ensure high data quality and accuracy. Additionally, automated annotation tools are used to generate instructions and perform consistency checks on the texts. The LLM-Tourism dataset primarily relies on data from Ctrip and Baidu Baike, covering five Northwestern Chinese provinces: Gansu, Ningxia, Qinghai, Shaanxi, and Xinjiang, containing 53,280 pairs of structured data in JSON format. The creation of this dataset will not only improve the generation accuracy of tourism large models but also contribute to the sharing and application of tourism-related datasets in the field of large models.

  13. E

    Rule-based Synthetic Data for Japanese GEC

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    • +1more
    tsv
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Rule-based Synthetic Data for Japanese GEC [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7679
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Oct 28, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title: Rule-based Synthetic Data for Japanese GEC. Dataset Contents:This dataset contains two parallel corpora intended for the training and evaluating of models for the NLP (natural language processing) subtask of Japanese GEC (grammatical error correction). These are as follows:Synthetic Corpus - synthesized_data.tsv. This corpus file contains 2,179,130 parallel sentence pairs synthesized using the process described in [1]. Each line of the file consists of two sentences delimited by a tab. The first sentence is the erroneous sentence while the second is the corresponding correction.These paired sentences are derived from data scraped from the keyword-lookup site

  14. f

    Appendix A. Parameter values used to generate expected value data sets.

    • datasetcatalog.nlm.nih.gov
    • wiley.figshare.com
    Updated Aug 9, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bailey, Larissa L.; Kendall, William L.; Converse, Sarah J. (2016). Appendix A. Parameter values used to generate expected value data sets. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001582396
    Explore at:
    Dataset updated
    Aug 9, 2016
    Authors
    Bailey, Larissa L.; Kendall, William L.; Converse, Sarah J.
    Description

    Parameter values used to generate expected value data sets.

  15. R

    Connector Generate Dataset Dataset

    • universe.roboflow.com
    zip
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yang junzhi (2023). Connector Generate Dataset Dataset [Dataset]. https://universe.roboflow.com/yang-junzhi/connector-generate-dataset/dataset/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 23, 2023
    Dataset authored and provided by
    yang junzhi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Defect Bounding Boxes
    Description

    Connector Generate Dataset

    ## Overview
    
    Connector Generate Dataset is a dataset for object detection tasks - it contains Defect annotations for 255 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. f

    Dataset for: Simulation and data-generation for random-effects network...

    • wiley.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Svenja Elisabeth Seide; Katrin Jensen; Meinhard Kieser (2023). Dataset for: Simulation and data-generation for random-effects network meta-analysis of binary outcome [Dataset]. http://doi.org/10.6084/m9.figshare.8001863.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wiley
    Authors
    Svenja Elisabeth Seide; Katrin Jensen; Meinhard Kieser
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The performance of statistical methods is frequently evaluated by means of simulation studies. In case of network meta-analysis of binary data, however, available data- generating models are restricted to either inclusion of two-armed trials or the fixed-effect model. Based on data-generation in the pairwise case, we propose a framework for the simulation of random-effect network meta-analyses including multi-arm trials with binary outcome. The only of the common data-generating models which is directly applicable to a random-effects network setting uses strongly restrictive assumptions. To overcome these limitations, we modify this approach and derive a related simulation procedure using odds ratios as effect measure. The performance of this procedure is evaluated with synthetic data and in an empirical example.

  17. R

    Ai Generate Detection Dataset

    • universe.roboflow.com
    zip
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blaze Warriors (2025). Ai Generate Detection Dataset [Dataset]. https://universe.roboflow.com/blaze-warriors/ai-generate-detection/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 19, 2025
    Dataset authored and provided by
    Blaze Warriors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    AI Human
    Description

    AI Generate Detection

    ## Overview
    
    AI Generate Detection is a dataset for classification tasks - it contains AI Human annotations for 9,900 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  18. R

    Dataset Dog Tail After Generate 1 Class Tail Dataset

    • universe.roboflow.com
    zip
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    25425 (2025). Dataset Dog Tail After Generate 1 Class Tail Dataset [Dataset]. https://universe.roboflow.com/25425/dataset-dog-tail-after-generate-1-class-tail
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 27, 2025
    Dataset authored and provided by
    25425
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Dogs 6dBV Bounding Boxes
    Description

    Dataset Dog Tail After Generate 1 Class Tail

    ## Overview
    
    Dataset Dog Tail  After Generate  1 Class Tail is a dataset for object detection tasks - it contains Dogs 6dBV annotations for 2,252 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. Data used by EPA researchers to generate illustrative figures for overview...

    • datasets.ai
    • s.cnmilf.com
    • +1more
    57
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2024). Data used by EPA researchers to generate illustrative figures for overview article "Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management" [Dataset]. https://datasets.ai/datasets/data-used-by-epa-researchers-to-generate-illustrative-figures-for-overview-article-multisc
    Explore at:
    57Available download formats
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Authors
    U.S. Environmental Protection Agency
    Description

    Data sets used to prepare illustrative figures for the overview article “Multiscale Modeling of Background Ozone” Overview

    The CMAQ model output datasets used to create illustrative figures for this overview article were generated by scientists in EPA/ORD/CEMM and EPA/OAR/OAQPS.

    The EPA/ORD/CEMM-generated dataset consisted of hourly CMAQ output from two simulations. The first simulation was performed for July 1 – 31 over a 12 km modeling domain covering the Western U.S. The simulation was configured with the Integrated Source Apportionment Method (ISAM) to estimate the contributions from 9 source categories to modeled ozone. ISAM source contributions for July 17 – 31 averaged over all grid cells located in Colorado were used to generate the illustrative pie chart in the overview article. The second simulation was performed for October 1, 2013 – August 31, 2014 over a 108 km modeling domain covering the northern hemisphere. This simulation was also configured with ISAM to estimate the contributions from non-US anthropogenic sources, natural sources, stratospheric ozone, and other sources on ozone concentrations. Ozone ISAM results from this simulation were extracted along a boundary curtain of the 12 km modeling domain specified over the Western U.S. for the time period January 1, 2014 – July 31, 2014 and used to generate the illustrative time-height cross-sections in the overview article.

    The EPA/OAR/OAQPS-generated dataset consisted of hourly gridded CMAQ output for surface ozone concentrations for the year 2016. The CMAQ simulations were performed over the northern hemisphere at a horizontal resolution of 108 km. NO2 and O3 data for July 2016 was extracted from these simulations generate the vertically-integrated column densities shown in the illustrative comparison to satellite-derived column densities.

    CMAQ Model Data

    The data from the CMAQ model simulations used in this research effort are very large (several terabytes) and cannot be uploaded to ScienceHub due to size restrictions. The model simulations are stored on the /asm archival system accessible through the atmos high-performance computing (HPC) system. Due to data management policies, files on /asm are subject to expiry depending on the template of the project. Files not requested for extension after the expiry date are deleted permanently from the system. The format of the files used in this analysis and listed below is ioapi/netcdf. Documentation of this format, including definitions of the geographical projection attributes contained in the file headers, are available at https://www.cmascenter.org/ioapi/

    Documentation on the CMAQ model, including a description of the output file format and output model species can be found in the CMAQ documentation on the CMAQ GitHub site at https://github.com/USEPA/CMAQ.

    This dataset is associated with the following publication: Hogrefe, C., B. Henderson, G. Tonnesen, R. Mathur, and R. Matichuk. Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management. EM Magazine. Air and Waste Management Association, Pittsburgh, PA, USA, 1-6, (2020).

  20. R

    Generate Ray Dataset

    • universe.roboflow.com
    zip
    Updated Dec 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    test (2024). Generate Ray Dataset [Dataset]. https://universe.roboflow.com/test-szbyx/generate-ray/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 22, 2024
    Dataset authored and provided by
    test
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    0 1 2 3 4 KH9O Bounding Boxes
    Description

    Generate Ray

    ## Overview
    
    Generate Ray is a dataset for object detection tasks - it contains 0 1 2 3 4 KH9O annotations for 279 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
distilabel-internal-testing (2024). example-generate-preference-dataset [Dataset]. https://huggingface.co/datasets/distilabel-internal-testing/example-generate-preference-dataset

example-generate-preference-dataset

distilabel-internal-testing/example-generate-preference-dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 23, 2024
Dataset authored and provided by
distilabel-internal-testing
Description

Dataset Card for example-preference-dataset

This dataset has been created with distilabel.

  Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/sdiazlor/example-preference-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/distilabel-internal-testing/example-generate-preference-dataset.

Search
Clear search
Close search
Google apps
Main menu