100+ datasets found
  1. h

    Wikipedia-example-data

    • huggingface.co
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TopicNavi (2024). Wikipedia-example-data [Dataset]. https://huggingface.co/datasets/TopicNavi/Wikipedia-example-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2024
    Dataset authored and provided by
    TopicNavi
    Description

    TopicNavi/Wikipedia-example-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. Training images

    • redivis.com
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). Training images [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
    Explore at:
    Dataset updated
    Oct 20, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Aug 8, 2022
    Description

    This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_kd.

  3. h

    dataset-card-example

    • huggingface.co
    Updated Sep 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Templates (2023). dataset-card-example [Dataset]. https://huggingface.co/datasets/templates/dataset-card-example
    Explore at:
    Dataset updated
    Sep 28, 2023
    Dataset authored and provided by
    Templates
    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

      Dataset Sources [optional]
    

    Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/templates/dataset-card-example.

  4. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  5. u

    Example (synthetic) electronic health record data

    • rdr.ucl.ac.uk
    application/csv
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Harris; Wai Shing Lai (2024). Example (synthetic) electronic health record data [Dataset]. http://doi.org/10.5522/04/25676298.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    University College London
    Authors
    Steve Harris; Wai Shing Lai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These data are modelled using the OMOP Common Data Model v5.3.Correlated Data SourceNG tube vocabulariesGeneration RulesThe patient’s age should be between 18 and 100 at the moment of the visit.Ethnicity data is using 2021 census data in England and Wales (Census in England and Wales 2021) .Gender is equally distributed between Male and Female (50% each).Every person in the record has a link in procedure_occurrence with the concept “Checking the position of nasogastric tube using X-ray”2% of person records have a link in procedure_occurrence with the concept of “Plain chest X-ray”60% of visit_occurrence has visit concept “Inpatient Visit”, while 40% have “Emergency Room Visit”NotesVersion 0Generated by man-made rule/story generatorStructural correct, all tables linked with the relationshipWe used national ethnicity data to generate a realistic distribution (see below)2011 Race Census figure in England and WalesEthnic Group : Population(%)Asian or Asian British: Bangladeshi - 1.1Asian or Asian British: Chinese - 0.7Asian or Asian British: Indian - 3.1Asian or Asian British: Pakistani - 2.7Asian or Asian British: any other Asian background -1.6Black or African or Caribbean or Black British: African - 2.5Black or African or Caribbean or Black British: Caribbean - 1Black or African or Caribbean or Black British: other Black or African or Caribbean background - 0.5Mixed multiple ethnic groups: White and Asian - 0.8Mixed multiple ethnic groups: White and Black African - 0.4Mixed multiple ethnic groups: White and Black Caribbean - 0.9Mixed multiple ethnic groups: any other Mixed or multiple ethnic background - 0.8White: English or Welsh or Scottish or Northern Irish or British - 74.4White: Irish - 0.9White: Gypsy or Irish Traveller - 0.1White: any other White background - 6.4Other ethnic group: any other ethnic group - 1.6Other ethnic group: Arab - 0.6

  6. H

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jamie Monogan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  7. Sample Leads Dataset

    • kaggle.com
    zip
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ThatSean (2022). Sample Leads Dataset [Dataset]. https://www.kaggle.com/datasets/thatsean/sample-leads-dataset
    Explore at:
    zip(22640 bytes)Available download formats
    Dataset updated
    Jun 24, 2022
    Authors
    ThatSean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.

  8. h

    Data from: example-dataset

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raj Kumar, example-dataset [Dataset]. https://huggingface.co/datasets/rajkstats/example-dataset
    Explore at:
    Authors
    Raj Kumar
    Description

    Dataset Card for example-dataset

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/rajkstats/example-dataset/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/rajkstats/example-dataset.

  9. d

    Example data file for TRUEMET Version 2.2

    • catalog.data.gov
    • datasets.ai
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Example data file for TRUEMET Version 2.2 [Dataset]. https://catalog.data.gov/dataset/example-data-file-for-truemet-version-2-2
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    U.S. Fish and Wildlife Service
    Description

    This file is an example data set from the Central Valley of California from a drought study corresponding to “recent non-drought conditions” (Scenario 1 in Petrie et al., in review). In 2014, following an 8-year period with 7 below-normal to critically-dry water years, the bioenergetic model TRUEMET was used to assess the impacts of drought on wintering waterfowl habitat and bioenergetics in the Central Valley of California. The goal of the study was to assess whether available foraging habitats could provide enough food to support waterfowl populations (ducks and geese) under a variety of climate and population level scenarios. This information could then be used by managers to adapt their waterfowl habitat management plans to drought conditions. The study area spanned the Central Valley and included the Sacramento Valley in the north, the San Joaquin Valley in the south, and Suisun Marsh and Sacramento-San Joaquin River Delta (Delta) east of San Francisco Bay. The data set consists of two foraging guilds (ducks and geese/swans) and five forage types: harvested corn, rice (flooded), rice (unflooded), wetland invertebrates and wetland moist soil seeds. For more background on the data set, see Petrie et al. in review.

  10. Dataset #1: Cross-sectional survey data

    • figshare.com
    txt
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Adam Baimel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    N.B. This is not real data. Only here for an example for project templates.

    Project Title: Add title here

    Project Team: Add contact information for research project team members

    Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

    Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

    Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

    Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

    Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

    Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

    List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

    Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

    Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

  11. s

    Snowplow Modeled Customer Data Sample

    • snowplow.io
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Snowplow Analytics, Snowplow Modeled Customer Data Sample [Dataset]. https://snowplow.io/explore-snowplow-data-part-2
    Explore at:
    Dataset authored and provided by
    Snowplow Analytics
    Time period covered
    Apr 1, 2020 - Apr 3, 2020
    Variables measured
    user_id, mkt_source, page_views, session_id, conversions, geo_country, device_class, mkt_campaign, session_length, time_engaged_in_s
    Description

    Example of modeled customer behavioral data showing user sessions, engagement metrics, and conversion data across multiple platforms and devices

  12. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  13. Streaming Service Data

    • kaggle.com
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chad Wambles (2024). Streaming Service Data [Dataset]. https://www.kaggle.com/datasets/chadwambles/streaming-service-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chad Wambles
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A dataset I generated to showcase a sample set of user data for a fictional streaming service. This data is great for practicing SQL, Excel, Tableau, or Power BI.

    1000 rows and 25 columns of connected data.

    See below for column descriptions.

    Enjoy :)

  14. h

    example-generate-preference-dataset

    • huggingface.co
    Updated Aug 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    distilabel-internal-testing (2024). example-generate-preference-dataset [Dataset]. https://huggingface.co/datasets/distilabel-internal-testing/example-generate-preference-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2024
    Dataset authored and provided by
    distilabel-internal-testing
    Description

    Dataset Card for example-preference-dataset

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/sdiazlor/example-preference-dataset/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/distilabel-internal-testing/example-generate-preference-dataset.

  15. Sample Sales Data (5 million transactions)

    • kaggle.com
    zip
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Chua (2021). Sample Sales Data (5 million transactions) [Dataset]. https://www.kaggle.com/datasets/weitat/sample-sales
    Explore at:
    zip(201186399 bytes)Available download formats
    Dataset updated
    Jul 8, 2021
    Authors
    Chris Chua
    Description

    Dataset

    This dataset was created by Chris Chua

    Contents

  16. A

    Example of a Public Data Set

    • data.atlanticsalmontrust.org
    csv
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Atlantic Salmon Trust (2025). Example of a Public Data Set [Dataset]. https://data.atlanticsalmontrust.org/dataset/example-of-a-public-data-set
    Explore at:
    csv(89183)Available download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    The Atlantic Salmon Trust
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an example of a public dataset on the AST Data Repository

  17. cases study1 example for google data analytics

    • kaggle.com
    zip
    Updated Apr 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mohammed hatem (2023). cases study1 example for google data analytics [Dataset]. https://www.kaggle.com/datasets/mohammedhatem/cases-study1-example-for-google-data-analytics
    Explore at:
    zip(25278847 bytes)Available download formats
    Dataset updated
    Apr 22, 2023
    Authors
    mohammed hatem
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    In the way of my journey to earn the google data analytics certificate I will practice real world example by following the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Picking the Bellabeat example.

  18. h

    tdce-example-simple-dataset

    • huggingface.co
    Updated Jun 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theethawat Savastham (2025). tdce-example-simple-dataset [Dataset]. https://huggingface.co/datasets/theethawats98/tdce-example-simple-dataset
    Explore at:
    Dataset updated
    Jun 8, 2025
    Authors
    Theethawat Savastham
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Example Dataset For Time-Driven Cost Estimation Learning Model

    This dataset is the inspired-simulated data (the actual data is removed). This data is related to the Time-Driven Activity-Based Costing (TDABC) Principle.

      Simple Dataset
    

    It include the data with low variation and low dimension. It includes 4 files that bring from the manufacturing management system, which can be listed as.

    Process Data (generated_process_data) it contains the manufacturing process data… See the full description on the dataset page: https://huggingface.co/datasets/theethawats98/tdce-example-simple-dataset.

  19. d

    Data Management Plan Examples Database

    • search.dataone.org
    • borealisdata.ca
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
    Explore at:
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Borealis
    Authors
    Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak
    Time period covered
    Jan 1, 2011 - Jan 1, 2023
    Description

    This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.

  20. Aerospace Example - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Aerospace Example - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/aerospace-example
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This is a textbook, created example for illustration purposes. The System takes inputs of Pt, Ps, and Alt, and calculates the Mach number using the Rayleigh Pitot Tube equation if the plane is flying supersonically. (See Anderson.) The unit calculates Cd given the Ma and Alt. For more details, see the NASA TM, also on this website.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TopicNavi (2024). Wikipedia-example-data [Dataset]. https://huggingface.co/datasets/TopicNavi/Wikipedia-example-data

Wikipedia-example-data

TopicNavi/Wikipedia-example-data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2024
Dataset authored and provided by
TopicNavi
Description

TopicNavi/Wikipedia-example-data dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu