100+ datasets found
  1. Z

    Solution #4 for Predicting Molecular Properties Kaggle Competition

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stojanovic, Luka (2020). Solution #4 for Predicting Molecular Properties Kaggle Competition [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3406153
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Stojanovic, Luka
    Rakocevic, Goran
    Popovic, Milos
    Tijanic, Nebojsa
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Code and additional data for solution #4 in Predicting Molecular Properties competition, described in #4 Solution [Hyperspatial Engineers].

  2. PlaygroundS4E06|OriginalData

    • kaggle.com
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravi Ramakrishnan (2024). PlaygroundS4E06|OriginalData [Dataset]. https://www.kaggle.com/datasets/ravi20076/playgrounds4e06originaldata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ravi Ramakrishnan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This data is downloaded from the link shared in the PlaygroundS4E06 episode on the data page. We add a column id to keep consistency with the competition data and upload herewith.
    Please feel free to use this dataset as part of your pipeline.

    Key links:- 1. Competition - https://www.kaggle.com/competitions/playground-series-s4e6 2. Data page- https://www.kaggle.com/competitions/playground-series-s4e6/data
    3. Original dataset link- https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success

    This is a .csv file. Please use pandas.read_csv() or polars.scan_csv() to read in the file

    Best regards!

  3. Meta_Kaggle_Competitions_cleaned_dataset

    • kaggle.com
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarvpreet Kaur (2025). Meta_Kaggle_Competitions_cleaned_dataset [Dataset]. https://www.kaggle.com/datasets/sarvpreetkaur22/meta-kaggle-competitions-cleaned-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sarvpreet Kaur
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📝 Description:

    A cleaned version of Competitions.csv focused on timeline analysis.

    ✅ Includes: CompetitionId, Title, Deadline, EnabledDate, HostSegmentTitle ✅ Helps understand growth over time, and regional hosting focus ✅ Can be joined with teams_clean.csv and user_achievements_clean.csv

  4. h

    Eedi-competition-kaggle-prompt-formats-Phi

    • huggingface.co
    Updated Sep 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EVANGELOS PAPAMITSOS (2024). Eedi-competition-kaggle-prompt-formats-Phi [Dataset]. https://huggingface.co/datasets/VaggP/Eedi-competition-kaggle-prompt-formats-Phi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2024
    Authors
    EVANGELOS PAPAMITSOS
    Description

    VaggP/Eedi-competition-kaggle-prompt-formats-Phi dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    Kaggle-LLM-Science-Exam

    • huggingface.co
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sangeetha Venkatesan (2023). Kaggle-LLM-Science-Exam [Dataset]. https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Authors
    Sangeetha Venkatesan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for [LLM Science Exam Kaggle Competition]

      Dataset Summary
    

    https://www.kaggle.com/competitions/kaggle-llm-science-exam/data

      Languages
    

    [en, de, tl, it, es, fr, pt, id, pl, ro, so, ca, da, sw, hu, no, nl, et, af, hr, lv, sl]

      Dataset Structure
    

    Columns prompt - the text of the question being asked A - option A; if this option is correct, then answer will be A B - option B; if this option is correct, then answer will be B C - option C; if this
 See the full description on the dataset page: https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam.

  6. JANE STREET PREPROCESSED

    • kaggle.com
    Updated Dec 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2020). JANE STREET PREPROCESSED [Dataset]. https://www.kaggle.com/datasets/saurabhshahane/jane-street-preprocessed-train
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 22, 2020
    Dataset provided by
    Kaggle
    Authors
    Saurabh Shahane
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Saurabh Shahane

    Released under CC0: Public Domain

    Contents

  7. A

    ‘Kaggle Competitions Ranking’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Kaggle Competitions Ranking’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-kaggle-competitions-ranking-f15f/7682e95e/?iid=003-169&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Kaggle Competitions Ranking’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vivovinco/kaggle-competitions-ranking on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains Kaggle ranking of competitions.

    Content

    5000 rows and 8 columns. Columns' description are listed below.

    • Rank : Rank of the user
    • Tier : Grandmaster, Master or Expert
    • Username : Name of the user
    • Join Date : Year of join
    • Gold Medals : Number of gold medals
    • Silver Medals : Number of silver medals
    • Bronze Medals : Number of bronze medals
    • Points : Total points

    Acknowledgements

    Data from Kaggle. Image from Olympics.

    If you're reading this, please upvote.

    --- Original source retains full ownership of the source dataset ---

  8. Code4ML 2.0

    • zenodo.org
    csv, txt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonimous authors; Anonimous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

    The original dataset is organized into multiple CSV files, each containing structured data on different entities:

    • code_blocks.csv: Contains raw code snippets extracted from Kaggle.
    • kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
    • competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
    • markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
    • vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

    Table 1. code_blocks.csv structure

    ColumnDescription
    code_blocks_indexGlobal index linking code blocks to markup_data.csv.
    kernel_idIdentifier for the Kaggle Jupyter notebook from which the code block was extracted.
    code_block_id

    Position of the code block within the notebook.

    code_block

    The actual machine learning code snippet.

    Table 2. kernels_meta.csv structure

    ColumnDescription
    kernel_idIdentifier for the Kaggle Jupyter notebook.
    kaggle_scorePerformance metric of the notebook.
    kaggle_commentsNumber of comments on the notebook.
    kaggle_upvotesNumber of upvotes the notebook received.
    kernel_linkURL to the notebook.
    comp_nameName of the associated Kaggle competition.

    Table 3. competitions_meta.csv structure

    ColumnDescription
    comp_nameName of the Kaggle competition.
    descriptionOverview of the competition task.
    data_typeType of data used in the competition.
    comp_typeClassification of the competition.
    subtitleShort description of the task.
    EvaluationAlgorithmAbbreviationMetric used for assessing competition submissions.
    data_sourcesLinks to datasets used.
    metric typeClass label for the assessment metric.

    Table 4. markup_data.csv structure

    ColumnDescription
    code_blockMachine learning code block.
    too_longFlag indicating whether the block spans multiple semantic types.
    marksConfidence level of the annotation.
    graph_vertex_idID of the semantic type.

    The dataset allows mapping between these tables. For example:

    • code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
    • kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

    In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

    Code4ML 2.0 Enhancements

    The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

    Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

    competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

    Applications

    The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

    • Code generation
    • Code understanding
    • Natural language processing of code-related tasks
  9. Predict Future Sales (translated to English)

    • kaggle.com
    Updated Nov 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YWenLin (2020). Predict Future Sales (translated to English) [Dataset]. https://www.kaggle.com/datasets/ywhenlyn/predict-future-sales-translated-to-english/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    YWenLin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Original data from Predict Future Sales (Kaggle Competition) Translated items_categories.csv, shops.csv, items.csv from Russian to English for easy features engineering and references.

    File Information

    Translated item description and shop name from Russian to English items.csv - supplemental information about the items/products. item_categories.csv - supplemental information about the items categories. shops.csv- supplemental information about the shops.

    Column Description

    • ID - an Id that represents a (Shop, Item) tuple within the test set
    • shop_id - unique identifier of a shop
    • item_id - unique identifier of a product
    • item_name - name of item
    • shop_name - name of shop
    • item_category_name - name of item category
  10. h

    BirdCLEF-Challenge2023-Kaggle

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernardo Cecchetto, BirdCLEF-Challenge2023-Kaggle [Dataset]. https://huggingface.co/datasets/bernardocecchetto/BirdCLEF-Challenge2023-Kaggle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Bernardo Cecchetto
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains audios of 264 species of birds singing that were all processed. It was processed as follows:

    Stereo to Mono Resampled 16kHz High Pass Filter (1500Hz and filter order of 16) Normalized

    The raw dataset was provided by the BirdCLEF 2023 challenge from Kaggle. You can access it in https://www.kaggle.com/competitions/birdclef-2023/data

  11. T

    wit_kaggle

    • tensorflow.org
    Updated Dec 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). wit_kaggle [Dataset]. https://www.tensorflow.org/datasets/catalog/wit_kaggle
    Explore at:
    Dataset updated
    Dec 22, 2022
    Description

    Wikipedia - Image/Caption Matching Kaggle Competition.

    This competition is organized by the Research team at the Wikimedia Foundation in collaboration with Google Research and a few external collaborators. This competition is based on the WIT dataset published by Google Research as detailed in thisSIGIR paper.

    In this competition, you’ll build a model that automatically retrieves the text closest to an image. Specifically, you'll train your model to associate given images with article titles or complex captions, in multiple languages. The best models will account for the semantic granularity of Wikipedia images. If successful, you'll be contributing to the accessibility of the largest online encyclopedia. The millions of Wikipedia readers and edietors will be able to more easily understand, search, and describe media at scale. As a result, you’ll contribute to an open model to improve learning for all.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('wit_kaggle', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/wit_kaggle-train_with_extended_features-1.0.2.png" alt="Visualization" width="500px">

  12. home data for ml course

    • kaggle.com
    zip
    Updated Aug 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juliån Pérez Pesce (2019). home data for ml course [Dataset]. https://www.kaggle.com/datasets/estrotococo/home-data-for-ml-course
    Explore at:
    zip(199207 bytes)Available download formats
    Dataset updated
    Aug 27, 2019
    Authors
    Juliån Pérez Pesce
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Exercise: Machine Learning Competitions

    When you click on Run / All, the notebook will give you an error: "Files doesn't exist" With this DataSet you fix that. It's the same from DanB. Please UPVOTE!

    Enjoy!

  13. FSDKaggle2018

    • zenodo.org
    • opendatalab.com
    • +2more
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal (2020). FSDKaggle2018 [Dataset]. http://doi.org/10.5281/zenodo.2552860
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal
    Description

    FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge.

    Citation

    If you use the FSDKaggle2018 dataset or part of it, please cite our DCASE 2018 paper:

    Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". Proceedings of the DCASE 2018 Workshop (2018)

    You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2018.

    Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

    About this dataset

    Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology [1]. FSDKaggle2018 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2018. Please visit the DCASE2018 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound General-Purpose Audio Tagging Challenge. It was organized by researchers from the Music Technology Group of Universitat Pompeu Fabra, and from Google Research’s Machine Perception Team.

    The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology.

    All audio samples in this dataset are gathered from Freesound [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely.

    The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the Data labeling process section. FSDKaggle2018 clips are unequally distributed in the following 41 categories of the AudioSet Ontology:

    "Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".

    Some other relevant characteristics of FSDKaggle2018:

    • The dataset is split into a train set and a test set.

    • The train set is meant to be for system development and includes ~9.5k samples unequally distributed among 41 categories. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h.

    • Out of the ~9.5k samples from the train set, ~3.7k have manually-verified ground truth annotations and ~5.8k have non-verified annotations. The non-verified annotations of the train set have a quality estimate of at least 65-70% in each category. Checkout the Data labeling process section below for more information about this aspect.

    • Non-verified annotations in the train set are properly flagged in train.csv so that participants can opt to use this information during the development of their systems.

    • The test set is composed of 1.6k samples with manually-verified annotations and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h.

    • All audio samples in this dataset have a single label (i.e. are only annotated with one label). Checkout the Data labeling process section below for more information about this aspect. A single label should be predicted for each file in the test set.

    Data labeling process

    The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or labels), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were automatically annotated with labels from the AudioSet Ontology. These annotations can be understood as weak labels since they express the presence of a sound category in an audio sample.

    Then, a data validation process was carried out in which a number of participants did listen to the annotated sounds and manually assessed the presence/absence of an automatically assigned sound category, according to the AudioSet category description.

    Audio samples in FSDKaggle2018 are only annotated with a single ground truth label (see train.csv). A total of 3,710 annotations included in the train set of FSDKaggle2018 are annotations that have been manually validated as present and predominant (some with inter-annotator agreement but not all of them). This means that in most cases there is no additional acoustic material other than the labeled category. In few cases there may be some additional sound events, but these additional events won't belong to any of the 41 categories of FSDKaggle2018.

    The rest of the annotations have not been manually validated and therefore some of them could be inaccurate. Nonetheless, we have estimated that at least 65-70% of the non-verified annotations per category in the train set are indeed correct. It can happen that some of these non-verified audio samples present several sound sources even though only one label is provided as ground truth. These additional sources are typically out of the set of the 41 categories, but in a few cases they could be within.

    More details about the data labeling process can be found in [3].

    License

    FSDKaggle2018 has licenses at two different levels, as explained next.

    All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of the audio clips included in FSDKaggle2018 and their corresponding license. The licenses are specified in the files train_post_competition.csv and test_post_competition_scoring_clips.csv.

    In addition, FSDKaggle2018 as a whole is the result of a curation process and it has an additional license. FSDKaggle2018 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2018.doc zip file.

    Files

    FSDKaggle2018 can be downloaded as a series of zip files with the following directory structure:

    root
    │
    └───FSDKaggle2018.audio_train/ Audio clips in the train set │
    └───FSDKaggle2018.audio_test/ Audio clips in the test set │
    └───FSDKaggle2018.meta/ Files for evaluation setup │ │
    │ └───train_post_competition.csv Data split and ground truth for the train set │ │
    │ └───test_post_competition_scoring_clips.csv Ground truth for the test set
    │
    └───FSDKaggle2018.doc/ │
    └───README.md The dataset description file you are reading │
    └───LICENSE-DATASET

  14. t

    Kaggle Restaurant Reviews Dataset - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Kaggle Restaurant Reviews Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/kaggle-restaurant-reviews-dataset
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The Kaggle sentiment analysis competition dataset contains unlabeled restaurant reviews used to supplement the labeled SemEval dataset for improved performance in sentiment analysis.

  15. Z

    Kaggle Wikipedia Web Traffic Daily Dataset (with Missing Values)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Webb, Geoff (2021). Kaggle Wikipedia Web Traffic Daily Dataset (with Missing Values) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3892813
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Webb, Geoff
    Bergmeir, Christoph
    Godahewa, Rakshitha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.

  16. EEG Motor Imagery BCICIV_2a

    • kaggle.com
    Updated May 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aymanmostafa11 (2023). EEG Motor Imagery BCICIV_2a [Dataset]. https://www.kaggle.com/datasets/aymanmostafa11/eeg-motor-imagery-bciciv-2a
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    aymanmostafa11
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset is a modified csv version of the BCI Competition IV 2a for ease of use for beginners

    Description

    The data can be interacted with two approaches: 1- Each patient separately: A csv file for each patient is provided for subject dependent tasks 2- All patients: the file with "all_patients" in it's name contain all patients data with a column specifying the patient number

    The events considered in the data are only the 4 target classes (left, right, foot, tongue), other events mentioned in the paper have been discarded for simplicity

    Acknowledgements

    BCI Competition IV This introductory youtube video

  17. f

    Data characteristics for the Kaggle.com seizure forecasting contest.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Javier Muñoz-Almaraz; Francisco Zamora-Martínez; Paloma Botella-Rocamora; Juan Pardo (2023). Data characteristics for the Kaggle.com seizure forecasting contest. [Dataset]. http://doi.org/10.1371/journal.pone.0178808.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Francisco Javier Muñoz-Almaraz; Francisco Zamora-Martínez; Paloma Botella-Rocamora; Juan Pardo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Source: [9].

  18. Abalone Dataset

    • kaggle.com
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Cenk Baytop (2024). Abalone Dataset [Dataset]. https://www.kaggle.com/datasets/alicenkbaytop/abalone-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ali Cenk Baytop
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Mixed abalone dataset based on original dataset and kaggle competition dataset. It has 2 files first original dataset added to train and second test file.

    • Sex: M, F, and I (infant)
    • Length: Longest shell measurement
    • Diameter: Perpendicular to length
    • Height: With meat in shell
    • Whole_weight: Whole abalone
    • Shucked_weight: Weight of meat
    • Viscera_weight: Gut weight (after bleeding)
    • Shell_weight: After being dried
    • Rings: +1.5 gives the age in years

    Sources: - https://www.kaggle.com/competitions/playground-series-s4e4/data - https://archive.ics.uci.edu/dataset/1/abalone

  19. h

    Tox

    • huggingface.co
    Updated Aug 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Luz (2023). Tox [Dataset]. https://huggingface.co/datasets/vluz/Tox
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 18, 2023
    Authors
    Victor Luz
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    A cleaned up version of train dataset from kaggle, the Toxic Comment Classification Challenge

    https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data?select=train.csv.zip the alt_format directory contains an alternate format intended for a tutorial.

    What was done:

    Removed extra spaces and new lines Removed non-printing characters Removed punctuation except apostrophe
 See the full description on the dataset page: https://huggingface.co/datasets/vluz/Tox.

  20. BirdCLEF_audio_info

    • kaggle.com
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krishnendu Dey (2023). BirdCLEF_audio_info [Dataset]. https://www.kaggle.com/datasets/krishnendudey/birdclef-audio-info
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Krishnendu Dey
    Description

    Dataset

    This dataset was created by Krishnendu Dey

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stojanovic, Luka (2020). Solution #4 for Predicting Molecular Properties Kaggle Competition [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3406153

Solution #4 for Predicting Molecular Properties Kaggle Competition

Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Stojanovic, Luka
Rakocevic, Goran
Popovic, Milos
Tijanic, Nebojsa
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Code and additional data for solution #4 in Predicting Molecular Properties competition, described in #4 Solution [Hyperspatial Engineers].

Search
Clear search
Close search
Google apps
Main menu