100+ datasets found
  1. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  2. Sample Data 1

    • kaggle.com
    zip
    Updated Jul 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aizat Mokhtar (2021). Sample Data 1 [Dataset]. https://www.kaggle.com/datasets/aizatmokhtar/sample-data-1/data
    Explore at:
    zip(309 bytes)Available download formats
    Dataset updated
    Jul 7, 2021
    Authors
    Aizat Mokhtar
    Description

    Dataset

    This dataset was created by Aizat Mokhtar

    Contents

  3. h

    RedPajama-Data-1T-Sample

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Together, RedPajama-Data-1T-Sample [Dataset]. https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Together
    Description

    RedPajama is a clean-room, fully open-source implementation of the LLaMa dataset. This is a 1B-token sample of the full dataset.

  4. Genomics examples

    • redivis.com
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). Genomics examples [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
    Explore at:
    Dataset updated
    Feb 8, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 30, 2025
    Description

    This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_id.

  5. UCI and OpenML Data Sets for Ordinal Quantification

    • zenodo.org
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()"
    julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  6. d

    Data from: Database for the U.S. Geological Survey Woods Hole Science...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Database for the U.S. Geological Survey Woods Hole Science Center's marine sediment samples, including locations, sample data and collection information (SED_ARCHIVE) [Dataset]. https://catalog.data.gov/dataset/database-for-the-u-s-geological-survey-woods-hole-science-centers-marine-sediment-samples-
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Woods Hole
    Description

    The U.S. Geological Survey (USGS), Woods Hole Science Center (WHSC) has been an active member of the Woods Hole research community for over 40 years. In that time there have been many sediment collection projects conducted by USGS scientists and technicians for the research and study of seabed environments and processes. These samples are collected at sea or near shore and then brought back to the WHSC for study. While at the Center, samples are stored in ambient temperature, cold or freezing conditions, depending on the best mode of preparation for the study being conducted or the duration of storage planned for the samples. Recently, storage methods and available storage space have become a major concern at the WHSC. The shapefile sed_archive.shp, gives a geographical view of the samples in the WHSC's collections, and where they were collected along with images and hyperlinks to useful resources.

  7. Data sample

    • kaggle.com
    zip
    Updated Sep 23, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HungDo (2017). Data sample [Dataset]. https://www.kaggle.com/datasets/hungdo1291/hung-data-full/suggestions
    Explore at:
    zip(2642308 bytes)Available download formats
    Dataset updated
    Sep 23, 2017
    Authors
    HungDo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by HungDo

    Released under CC0: Public Domain

    Contents

  8. H

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ARKOTI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jamie Monogan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  9. f

    Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS:...

    • frontiersin.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Loffing (2023). Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.ZIP [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Florian Loffing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.

  10. d

    Data from: Sample DataSet.

    • datadiscoverystudio.org
    csv, json, rdf, xml
    Updated Feb 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Sample DataSet. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/34b1d00ff032454bb8f6be1ff2acaa5c/html
    Explore at:
    xml, rdf, csv, jsonAvailable download formats
    Dataset updated
    Feb 4, 2018
    Description

    description: For testing purposes only; abstract: For testing purposes only

  11. Historic HPMS Data (Sample) - 1992

    • data.virginia.gov
    • data.transportation.gov
    • +1more
    csv, json, rdf, xsl
    Updated May 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S Department of Transportation (2024). Historic HPMS Data (Sample) - 1992 [Dataset]. https://data.virginia.gov/dataset/historic-hpms-data-sample-1992
    Explore at:
    xsl, rdf, json, csvAvailable download formats
    Dataset updated
    May 8, 2024
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Authors
    U.S Department of Transportation
    Description

    Historic Highway Performance Monitoring System sample data for the year 1992

  12. o

    SQLite SpatiaLite sample databases for testing spatial analysis function and...

    • osf.io
    Updated Feb 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mateusz Ilba (2020). SQLite SpatiaLite sample databases for testing spatial analysis function and performance of database [Dataset]. http://doi.org/10.17605/OSF.IO/2YM95
    Explore at:
    Dataset updated
    Feb 24, 2020
    Dataset provided by
    Center For Open Science
    Authors
    Mateusz Ilba
    Description

    Databases (for SQLite SpatiaLite) were created from publicly available OpenStreetMap data for Poland (https://www.openstreetmap.org/copyright). The db_small database comprises data for the area of the city of Kraków in the Małopolskie Province. The db_medium database comprises data from the entire Małopolskie Province. The db_large database, in addition to the Małopolskie Province, covers the Podkarpackie and Dolnośląskie Provinces. The db_v_large database covers the entire country.

  13. f

    Data from: Evaluating Supplemental Samples in Longitudinal Research:...

    • tandf.figshare.com
    txt
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura K. Taylor; Xin Tong; Scott E. Maxwell (2024). Evaluating Supplemental Samples in Longitudinal Research: Replacement and Refreshment Approaches [Dataset]. http://doi.org/10.6084/m9.figshare.12162072.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Laura K. Taylor; Xin Tong; Scott E. Maxwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.

  14. Living Standards Survey III 1991-1992 - World Bank SHIP Harmonized Dataset -...

    • datacatalog.ihsn.org
    • dev.ihsn.org
    • +2more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghana Statistical Service (GSS) (2019). Living Standards Survey III 1991-1992 - World Bank SHIP Harmonized Dataset - Ghana [Dataset]. https://datacatalog.ihsn.org/catalog/2358
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    Ghana Statistical Services
    Authors
    Ghana Statistical Service (GSS)
    Time period covered
    1991 - 1992
    Area covered
    Ghana
    Description

    Abstract

    Survey based Harmonized Indicators (SHIP) files are harmonized data files from household surveys that are conducted by countries in Africa. To ensure the quality and transparency of the data, it is critical to document the procedures of compiling consumption aggregation and other indicators so that the results can be duplicated with ease. This process enables consistency and continuity that make temporal and cross-country comparisons consistent and more reliable.

    Four harmonized data files are prepared for each survey to generate a set of harmonized variables that have the same variable names. Invariably, in each survey, questions are asked in a slightly different way, which poses challenges on consistent definition of harmonized variables. The harmonized household survey data present the best available variables with harmonized definitions, but not identical variables. The four harmonized data files are

    a) Individual level file (Labor force indicators in a separate file): This file has information on basic characteristics of individuals such as age and sex, literacy, education, health, anthropometry and child survival. b) Labor force file: This file has information on labor force including employment/unemployment, earnings, sectors of employment, etc. c) Household level file: This file has information on household expenditure, household head characteristics (age and sex, level of education, employment), housing amenities, assets, and access to infrastructure and services. d) Household Expenditure file: This file has consumption/expenditure aggregates by consumption groups according to Purpose (COICOP) of Household Consumption of the UN.

    Geographic coverage

    National

    Analysis unit

    • Individual level for datasets with suffix _I and _L
    • Household level for datasets with suffix _H and _E

    Universe

    The survey covered all de jure household members (usual residents).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A multi-stage sampling technique was used in selecting the GLSS sample. Initially, 4565 households were selected for GLSS3, spread around the country in 407 small clusters; in general, 15 households were taken in an urban cluster and 10 households in a rural cluster. The actual achieved sample was 4552 households. Because of the sample design used, and the very high response rate achieved, the sample can be considered as being selfweighting, though in the case of expenditure data weighting of the expenditure values is required.

    Mode of data collection

    Face-to-face [f2f]

  15. TELL sample output data

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey D Burleyson; Casey D Burleyson; Casey McGrath; Casey McGrath (2022). TELL sample output data [Dataset]. http://doi.org/10.5281/zenodo.6338472
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Casey D Burleyson; Casey D Burleyson; Casey McGrath; Casey McGrath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains sample output data for TELL. The sample dataset includes four years of sample future data (2039, 2059, 2079, and 2099) that comes from IM3's future WRF runs under the RCP 8.5 climate scenario with SSP5 population forcing. Note that the GCAM-USA output used in this simulation is sample data only. As such the quantitative results from this set of sample output should not be considered valid.

  16. 7+ Million Company Dataset

    • kaggle.com
    zip
    Updated May 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    People Data Labs (2019). 7+ Million Company Dataset [Dataset]. https://www.kaggle.com/datasets/peopledatalabssf/free-7-million-company-dataset
    Explore at:
    zip(291957415 bytes)Available download formats
    Dataset updated
    May 10, 2019
    Authors
    People Data Labs
    Description

    Dataset

    This dataset was created by People Data Labs

    Contents

  17. f

    Dataset for: Analyzing discrete competing risks data with partially...

    • wiley.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minjung Lee; Eric J Feuer; Zhuoqiao Wang; Hyunsoon Cho; Joe Zou; Benjamin Hankey; Angela B. Mariotto; Jason P Fine (2023). Dataset for: Analyzing discrete competing risks data with partially overlapping or independent data sources and non-standard sampling schemes, with application to cancer registries [Dataset]. http://doi.org/10.6084/m9.figshare.9795572.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wiley
    Authors
    Minjung Lee; Eric J Feuer; Zhuoqiao Wang; Hyunsoon Cho; Joe Zou; Benjamin Hankey; Angela B. Mariotto; Jason P Fine
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This paper demonstrates the flexibility of a general approach for the analysis of discrete time competing risks data that can accommodate complex data structures, different time scales for different causes, and nonstandard sampling schemes. The data may involve a single data source where all individuals contribute to analyses of both cause-specific hazard functions, overlapping datasets where some individuals contribute to the analysis of the cause-specific hazard function of only one cause while other individuals contribute to analyses of both cause-specific hazard functions, or separate data sources where each individual contributes to the analysis of the cause-specific hazard function of only a single cause. The approach is modularized into estimation and prediction. For the estimation step, the parameters and the variance-covariance matrix can be estimated using widely available software. The prediction step utilizes a generic program with plug-in estimates from the estimation step. The approach is illustrated with three prognostic models for stage IV male oral cancer using different data structures. The first model uses only men with stage IV oral cancer from population-based registry data. The second model strategically extends the cohort to improve the efficiency of the estimates. The third model improves the accuracy for those with a lower risk of other causes of death, by bringing in an independent data source collected under a complex sampling design with additional other-cause covariates. These analyses represent novel extensions of existing methodology, broadly applicable for the development of prognostic models capturing both the cancer and non-cancer aspects of a patient's health.

  18. s

    Netflow data with sampling for training - Datasets - open.scayle.es

    • open.scayle.es
    Updated Oct 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Netflow data with sampling for training - Datasets - open.scayle.es [Dataset]. https://open.scayle.es/dataset/netflow-data-with-sampling-for-training
    Explore at:
    Dataset updated
    Oct 1, 2020
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Netflow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device. Netflow flows have been captured by sampling at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued. In the construction of the datasets, different percentages of flows considered attacks and flows considered normal traffic have been used. These datasets have been used to train machine learning models.

  19. d

    Water Quality Sampling Data

    • catalog.data.gov
    • data.austintexas.gov
    • +1more
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2025). Water Quality Sampling Data [Dataset]. https://catalog.data.gov/dataset/water-quality-sampling-data
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    data.austintexas.gov
    Description

    Data collected to assess water quality conditions in the natural creeks, aquifers and lakes in the Austin area. This is raw data, provided directly from our Water Resources Monitoring database (WRM) and should be considered provisional. Data may or may not have been reviewed by project staff. A map of site locations can be found by searching for LOCATION.WRM_SAMPLE_SITES; you may then use those WRM_SITE_IDs to filter in this dataset using the field SAMPLE_SITE_NO.

  20. d

    HuTime Sample Dataset

    • data.depositar.io
    gtm, png, xml
    Updated Nov 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ANGIS Data Portal (2021). HuTime Sample Dataset [Dataset]. https://data.depositar.io/dataset/hutime-sample-dataset
    Explore at:
    xml(404471), xml(404620), png(17258), png(14736), png(16654), gtm(922), xml(4423), xml(10971), png(18462), gtm(3842), png(18461), xml(1580), gtm(910), xml(12018), xml(33534), gtm(888), gtm, gtm(909), gtm(919), png(16959), xml(1752), gtm(3808), png(17058), png(18189), png(25698), xml(22664), gtm(889), png(25284), xml(36523), gtm(4358), gtm(927), png(15002), xml(3759), gtm(914), gtm(4344), png(18968), xml(11367), xml(11783), png(18636)Available download formats
    Dataset updated
    Nov 30, 2021
    Dataset provided by
    ANGIS Data Portal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HuTime is the Time Information System which was developed by Dr.Tatsuki Sekino. The website of Hutime is http://www.hutime.org.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:
141 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Search
Clear search
Close search
Google apps
Main menu