Facebook
TwitterTopicNavi/Wikipedia-example-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_kd.
Facebook
TwitterDataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]
Dataset Sources [optional]
Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/templates/dataset-card-example.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These data are modelled using the OMOP Common Data Model v5.3.Correlated Data SourceNG tube vocabulariesGeneration RulesThe patient’s age should be between 18 and 100 at the moment of the visit.Ethnicity data is using 2021 census data in England and Wales (Census in England and Wales 2021) .Gender is equally distributed between Male and Female (50% each).Every person in the record has a link in procedure_occurrence with the concept “Checking the position of nasogastric tube using X-ray”2% of person records have a link in procedure_occurrence with the concept of “Plain chest X-ray”60% of visit_occurrence has visit concept “Inpatient Visit”, while 40% have “Emergency Room Visit”NotesVersion 0Generated by man-made rule/story generatorStructural correct, all tables linked with the relationshipWe used national ethnicity data to generate a realistic distribution (see below)2011 Race Census figure in England and WalesEthnic Group : Population(%)Asian or Asian British: Bangladeshi - 1.1Asian or Asian British: Chinese - 0.7Asian or Asian British: Indian - 3.1Asian or Asian British: Pakistani - 2.7Asian or Asian British: any other Asian background -1.6Black or African or Caribbean or Black British: African - 2.5Black or African or Caribbean or Black British: Caribbean - 1Black or African or Caribbean or Black British: other Black or African or Caribbean background - 0.5Mixed multiple ethnic groups: White and Asian - 0.8Mixed multiple ethnic groups: White and Black African - 0.4Mixed multiple ethnic groups: White and Black Caribbean - 0.9Mixed multiple ethnic groups: any other Mixed or multiple ethnic background - 0.8White: English or Welsh or Scottish or Northern Irish or British - 74.4White: Irish - 0.9White: Gypsy or Irish Traveller - 0.1White: any other White background - 6.4Other ethnic group: any other ethnic group - 1.6Other ethnic group: Arab - 0.6
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.
Facebook
TwitterDataset Card for example-dataset
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/rajkstats/example-dataset/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/rajkstats/example-dataset.
Facebook
TwitterThis file is an example data set from the Central Valley of California from a drought study corresponding to “recent non-drought conditions” (Scenario 1 in Petrie et al., in review). In 2014, following an 8-year period with 7 below-normal to critically-dry water years, the bioenergetic model TRUEMET was used to assess the impacts of drought on wintering waterfowl habitat and bioenergetics in the Central Valley of California. The goal of the study was to assess whether available foraging habitats could provide enough food to support waterfowl populations (ducks and geese) under a variety of climate and population level scenarios. This information could then be used by managers to adapt their waterfowl habitat management plans to drought conditions. The study area spanned the Central Valley and included the Sacramento Valley in the north, the San Joaquin Valley in the south, and Suisun Marsh and Sacramento-San Joaquin River Delta (Delta) east of San Francisco Bay. The data set consists of two foraging guilds (ducks and geese/swans) and five forage types: harvested corn, rice (flooded), rice (unflooded), wetland invertebrates and wetland moist soil seeds. For more background on the data set, see Petrie et al. in review.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
N.B. This is not real data. Only here for an example for project templates.
Project Title: Add title here
Project Team: Add contact information for research project team members
Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.
Relevant publications/outputs: When available, add links to the related publications/outputs from this data.
Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.
Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?
Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.
Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.
List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.
Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).
Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
Facebook
TwitterExample of modeled customer behavioral data showing user sessions, engagement metrics, and conversion data across multiple platforms and devices
Facebook
TwitterThe dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A dataset I generated to showcase a sample set of user data for a fictional streaming service. This data is great for practicing SQL, Excel, Tableau, or Power BI.
1000 rows and 25 columns of connected data.
See below for column descriptions.
Enjoy :)
Facebook
TwitterDataset Card for example-preference-dataset
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/sdiazlor/example-preference-dataset/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/distilabel-internal-testing/example-generate-preference-dataset.
Facebook
TwitterThis dataset was created by Chris Chua
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an example of a public dataset on the AST Data Repository
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
In the way of my journey to earn the google data analytics certificate I will practice real world example by following the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Picking the Bellabeat example.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Example Dataset For Time-Driven Cost Estimation Learning Model
This dataset is the inspired-simulated data (the actual data is removed). This data is related to the Time-Driven Activity-Based Costing (TDABC) Principle.
Simple Dataset
It include the data with low variation and low dimension. It includes 4 files that bring from the manufacturing management system, which can be listed as.
Process Data (generated_process_data) it contains the manufacturing process data… See the full description on the dataset page: https://huggingface.co/datasets/theethawats98/tdce-example-simple-dataset.
Facebook
TwitterThis dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Facebook
TwitterThis is a textbook, created example for illustration purposes. The System takes inputs of Pt, Ps, and Alt, and calculates the Mach number using the Rayleigh Pitot Tube equation if the plane is flying supersonically. (See Anderson.) The unit calculates Cd given the Ma and Alt. For more details, see the NASA TM, also on this website.
Facebook
TwitterTopicNavi/Wikipedia-example-data dataset hosted on Hugging Face and contributed by the HF Datasets community