100+ datasets found
  1. 2018 Kaggle Machine Learning Challenge dataset

    • kaggle.com
    zip
    Updated Nov 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sreenanda Sai Dasari (2021). 2018 Kaggle Machine Learning Challenge dataset [Dataset]. https://www.kaggle.com/datasets/sreenandasaidasari/2021-kaggle-machine-learning-challenge
    Explore at:
    zip(4127154 bytes)Available download formats
    Dataset updated
    Nov 28, 2021
    Authors
    Sreenanda Sai Dasari
    Description

    Dataset

    This dataset was created by Sreenanda Sai Dasari

    Contents

  2. c

    Data from: Dataset from the ATLAS Higgs Boson Machine Learning Challenge...

    • opendata.cern.ch
    Updated 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ATLAS collaboration (2014). Dataset from the ATLAS Higgs Boson Machine Learning Challenge 2014 [Dataset]. http://doi.org/10.7483/OPENDATA.ATLAS.ZBP2.M5T8
    Explore at:
    Dataset updated
    2014
    Dataset provided by
    CERN Open Data Portal
    Authors
    ATLAS collaboration
    Description

    The dataset has been built from official ATLAS full-detector simulation, with "Higgs to tautau" events mixed with different backgrounds. The simulator has two parts. In the first, random proton-proton collisions are simulated based on the knowledge that we have accumulated on particle physics. It reproduces the random microscopic explosions resulting from the proton-proton collisions. In the second part, the resulting particles are tracked through a virtual model of the detector. The process yields simulated events with properties that mimic the statistical properties of the real events with additional information on what has happened during the collision, before particles are measured in the detector.

    The signal sample contains events in which Higgs bosons (with a fixed mass of 125 GeV) were produced. The background sample was generated by other known processes that can produce events with at least one electron or muon and a hadronic tau, mimicking the signal. For the sake of simplicity, only three background processes were retained for the Challenge. The first comes from the decay of the Z boson (with a mass of 91.2 GeV) into two taus. This decay produces events with a topology very similar to that produced by the decay of a Higgs. The second set contains events with a pair of top quarks, which can have a lepton and a hadronic tau among their decay. The third set involves the decay of the W boson, where one electron or muon and a hadronic tau can appear simultaneously only through imperfections of the particle identification procedure.

    Due to the complexity of the simulation process, each simulated event has a weight that is proportional to the conditional density divided by the instrumental density used by the simulator (an importance-sampling flavour), and normalised for integrated luminosity such that, in any region, the sum of the weights of events falling in the region is an unbiased estimate of the expected number of events falling in the same region during a given fixed time interval. In our case, the weights correspond to the quantity of real data taken during the year 2012. The weights are an artifact of the way the simulation works and so they are not part of the input to the classifier. For the Challenge, weights have been provided in the training set so the AMS can be properly evaluated. Weights were not provided in the qualifying set since the weight distribution of the signal and background sets are very different and so they would give away the label immediately. However, in the opendata.cern.ch dataset, weights and labels have been provided for the complete dataset.

    The evaluation metric is the approximate median significance (AMS):

    \[ \text{AMS} = \sqrt{2\left((s+b+b_r) \log \left(1 + \frac{s}{b + b_r}\right)-s\right)}\]

    where

    • $s, b$: unnormalised true positive and false positive rates, respectively,
    • $b_r =10$ is the constant regularisation term,
    • $\log$ is the natural log.

    More precisely, let $(y_1, \ldots, y_n) \in \{\text{b},\text{s}\}^n$ be the vector of true test labels, let $(\hat{y}_1, \ldots, \hat{y}_n) \in \{\text{b},\text{s}\}^n$ be the vector of predicted (submitted) test labels, and let $(w_1, \ldots, w_n) \in {\mathbb{R}^+}^n$ be the vector of weights. Then

    \[ s = \sum_{i=1}^n w_i\mathbb{1}\{y_i = \text{s}\} \mathbb{1}\{\hat{y}_i = \text{s}\} \]

    and

    \[ b = \sum_{i=1}^n w_i\mathbb{1}\{y_i = \text{b}\} \mathbb{1}\{\hat{y}_i = \text{s}\}, \]

    where the indicator function $\mathbb{1}\{A\}$ is 1 if its argument $A$ is true and 0 otherwise.

    For more information on the statistical model and the derivation of the metric, see the documentation.

  3. HackerEath Machine Learning Challenge

    • kaggle.com
    zip
    Updated May 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krishna Chaitanya (2020). HackerEath Machine Learning Challenge [Dataset]. https://www.kaggle.com/datasets/krsna540/hackereath-machine-learning-challenge
    Explore at:
    zip(519199 bytes)Available download formats
    Dataset updated
    May 23, 2020
    Authors
    Krishna Chaitanya
    Description

    You work in an event management company. On Mother's Day, your company has organized an event where they want to cast positive Mother's Day related tweets in a presentation. Data engineers have already collected the data related to Mother's Day that must be categorized into positive, negative, and neutral tweets.

    You are appointed as a Machine Learning Engineer for this project. Your task is to build a model that helps the company classify these sentiments of the tweets into positive, negative, and neutral.

    Data description This data set consists of six columns:

    Column NameDescription
    id ID of tweet
    original_text Text of tweet
    lang Language of tweet
    retweet_count Number of times retweeted
    original_author Twitter handle of Author
    sentiment_class Sentiment of Tweet (Target)
    The data folder consists of two .csv files. The details are as follows:

    train.csv: 3235 x 6 test.csv: 1387 x 5

  4. Yahoo-Learning-to-Rank-Challenge

    • huggingface.co
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yahoo-Research (2024). Yahoo-Learning-to-Rank-Challenge [Dataset]. https://huggingface.co/datasets/YahooResearch/Yahoo-Learning-to-Rank-Challenge
    Explore at:
    Dataset updated
    Dec 15, 2024
    Dataset provided by
    Yahoo! Research
    Authors
    Yahoo-Research
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Yahoo! Learning to Rank Challenge, version 1.0

    Machine learning has been successfully applied to web search ranking and the goal of this dataset to benchmark such machine learning algorithms. The dataset consists of features extracted from (query,url) pairs along with relevance judgments. The queries, ulrs and features descriptions are not given, only the feature values are. There are two datasets in this distribution: a large one and a small one. Each dataset is divided in 3 sets:… See the full description on the dataset page: https://huggingface.co/datasets/YahooResearch/Yahoo-Learning-to-Rank-Challenge.

  5. Amazon Machine Learning Challenge 2025 Dataset

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aayush Suthar (2025). Amazon Machine Learning Challenge 2025 Dataset [Dataset]. https://www.kaggle.com/datasets/aayushsuthar02/amazon-machine-learning-challenge-2025-dataset
    Explore at:
    zip(49977798 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    Aayush Suthar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Aayush Suthar

    Released under Database: Open Database, Contents: Database Contents

    Contents

  6. n

    Open Cities AI Challenge Dataset

    • access.earthdata.nasa.gov
    • cmr.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Open Cities AI Challenge Dataset [Dataset]. http://doi.org/10.34911/rdnt.f94cxb
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This dataset was developed as part of a challenge to segment building footprints from aerial imagery. The goal of the challenge was to accelerate the development of more accurate, relevant, and usable open-source AI models to support mapping for disaster risk management in African cities [Read more about the challenge]. The data consists of drone imagery from 10 different cities and regions across Africa

  7. f

    General machine learning challenges, as reported in the literature.

    • datasetcatalog.nlm.nih.gov
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blasimme, Alessandro; Puhan, Milo Alan; Amann, Julia; Vayena, Effy; Gille, Felix; Hubbs, Shannon; Landers, Constantin; Daniore, Paola; Nittas, Vasileios (2023). General machine learning challenges, as reported in the literature. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001021304
    Explore at:
    Dataset updated
    Jan 31, 2023
    Authors
    Blasimme, Alessandro; Puhan, Milo Alan; Amann, Julia; Vayena, Effy; Gille, Felix; Hubbs, Shannon; Landers, Constantin; Daniore, Paola; Nittas, Vasileios
    Description

    General machine learning challenges, as reported in the literature.

  8. R

    Machine Learning Problem For Practice Yolo Dataset

    • universe.roboflow.com
    zip
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Nicole Rigay (2025). Machine Learning Problem For Practice Yolo Dataset [Dataset]. https://universe.roboflow.com/andrea-nicole-rigay/machine-learning-problem-for-practice-yolo
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Andrea Nicole Rigay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects HPQz 4HI5 ZO4x Bounding Boxes
    Description

    Machine Learning Problem FOR PRACTICE Yolo

    ## Overview
    
    Machine Learning Problem FOR PRACTICE Yolo is a dataset for object detection tasks - it contains Objects HPQz 4HI5 ZO4x annotations for 253 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. Trojan Detection Software Challenge - image-classification-jun2020-train

    • data.nist.gov
    • nist.gov
    • +1more
    Updated Mar 31, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Paul Majurski (2020). Trojan Detection Software Challenge - image-classification-jun2020-train [Dataset]. http://doi.org/10.18434/M32195
    Explore at:
    Dataset updated
    Mar 31, 2020
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Michael Paul Majurski
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    Round 1 Training Dataset The data being generated and disseminated is the training data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 1000 trained, human level, image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present. Errata: This dataset had a software bug in the trigger embedding code that caused 4 models trained for this dataset to have a ground truth value of 'poisoned' but which did not contain any triggers embedded. These models should not be used. Models Without a Trigger Embedded: id-00000184 id-00000599 id-00000858 id-00001088 Google Drive Mirror: https://drive.google.com/open?id=1uwVt3UCRL2fCX9Xvi2tLoz_z-DwbU6Ce

  10. Machine Learning Challenge: Exhibit Art

    • kaggle.com
    zip
    Updated Feb 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shashank Rajput (2021). Machine Learning Challenge: Exhibit Art [Dataset]. https://www.kaggle.com/datasets/shashankrajput9/machine-learning-challenge-exhibit-art
    Explore at:
    zip(524443 bytes)Available download formats
    Dataset updated
    Feb 19, 2021
    Authors
    Shashank Rajput
    Description

    Dataset

    This dataset was created by Shashank Rajput

    Contents

  11. R

    Machine Learning Problem 5 Dataset

    • universe.roboflow.com
    zip
    Updated Jun 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Nicole Rigay (2025). Machine Learning Problem 5 Dataset [Dataset]. https://universe.roboflow.com/andrea-nicole-rigay/machine-learning-problem-5/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Andrea Nicole Rigay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    Machine Learning Problem 5

    ## Overview
    
    Machine Learning Problem 5 is a dataset for object detection tasks - it contains Objects annotations for 253 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  12. u

    Dataset for the Systematic Review: "Machine Learning and Generative AI in...

    • portalcientifico.uvigo.gal
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RodrĂ­guez-Ortiz, Miguel A.; Anido-RifĂłn, Luis E.; Santana-Mancilla, Pedro C.; RodrĂ­guez-Ortiz, Miguel A.; Anido-RifĂłn, Luis E.; Santana-Mancilla, Pedro C. (2025). Dataset for the Systematic Review: "Machine Learning and Generative AI in Learning Analytics for Higher Education: A Systematic Review of Models, Trends, and Challenges" [Dataset]. https://portalcientifico.uvigo.gal/documentos/6813ec09e6f3433a4136e607
    Explore at:
    Dataset updated
    2025
    Authors
    RodrĂ­guez-Ortiz, Miguel A.; Anido-RifĂłn, Luis E.; Santana-Mancilla, Pedro C.; RodrĂ­guez-Ortiz, Miguel A.; Anido-RifĂłn, Luis E.; Santana-Mancilla, Pedro C.
    Description

    This dataset contains the structured data used in the systematic review titled "Machine Learning and Generative AI in Learning Analytics for Higher Education: A Systematic Review of Models, Trends, and Challenges". The dataset includes metadata extracted from 101 studies published between 2018 and 2025, covering variables such as year, country, educational context, AI models, application types, techniques, and methodological categories. It was used for descriptive, thematic, and cluster-based analyses reported in the article. The dataset is shared to support transparency, reproducibility, and further research in the field of Learning Analytics and Artificial Intelligence.

  13. f

    Kaggle Display Advertising Challenge dataset

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Dec 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiang, Zilong (2017). Kaggle Display Advertising Challenge dataset [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001796020
    Explore at:
    Dataset updated
    Dec 24, 2017
    Authors
    Jiang, Zilong
    Description

    Criteo Display Advertising Challenge dataset, which is provided by the Criteo company on the famous machine learning website Kaggle for advertising CTR .

  14. Cadenza Challenge (CAD1): databases for the First Cadenza Challenge - Task1

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerardo Roa Dabike; Gerardo Roa Dabike; Trevor John Cox; Trevor John Cox (2024). Cadenza Challenge (CAD1): databases for the First Cadenza Challenge - Task1 [Dataset]. http://doi.org/10.5281/zenodo.13285384
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gerardo Roa Dabike; Gerardo Roa Dabike; Trevor John Cox; Trevor John Cox
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cadenza

    This is the training, validation and evaluation data for the First Cadenza Challenge - Task 1.

    The Cadenza Challenges are improving music production and processing for people with a hearing loss. According to The World Health Organization, 430 million people worldwide have a disabling hearing loss. Studies show that not being able to understand lyrics is an important problem to tackle for those with hearing loss. Consequently, this task is about improving the intelligibility of lyrics when listening to pop/rock over headphones. But this needs to be done without losing too much audio quality - you can't improve intelligibility just by turning off the rest of the band! We will be using one metric for intelligibility and another metric for audio quality, and giving you different targets to explore the balance between these metrics.

    Please see the Cadenza website for a full description of the data

  15. f

    Image-related challenges of machine learning, as reported in the literature....

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nittas, Vasileios; Daniore, Paola; Vayena, Effy; Puhan, Milo Alan; Gille, Felix; Amann, Julia; Hubbs, Shannon; Landers, Constantin; Blasimme, Alessandro (2023). Image-related challenges of machine learning, as reported in the literature. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001021290
    Explore at:
    Dataset updated
    Jan 31, 2023
    Authors
    Nittas, Vasileios; Daniore, Paola; Vayena, Effy; Puhan, Milo Alan; Gille, Felix; Amann, Julia; Hubbs, Shannon; Landers, Constantin; Blasimme, Alessandro
    Description

    Image-related challenges of machine learning, as reported in the literature.

  16. h

    Ariel-Data-Challenge-NeurIPS-2022

    • huggingface.co
    Updated Sep 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anubhav De (2022). Ariel-Data-Challenge-NeurIPS-2022 [Dataset]. https://huggingface.co/datasets/n1ghtf4l1/Ariel-Data-Challenge-NeurIPS-2022
    Explore at:
    Dataset updated
    Sep 25, 2022
    Authors
    Anubhav De
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Ariel Data Challenge NeurIPS 2022

    Dataset is part of the Ariel Machine Learning Data Challenge. The Ariel Space mission is a European Space Agency mission to be launched in 2029. Ariel will observe the atmospheres of 1000 extrasolar planets - planets around other stars - to determine how they are made, how they evolve and how to put our own Solar System in the gallactic context.

      Understanding worlds in our Milky Way
    

    Today we know of roughly 5000 exoplanets in our… See the full description on the dataset page: https://huggingface.co/datasets/n1ghtf4l1/Ariel-Data-Challenge-NeurIPS-2022.

  17. Trojan Detection Software Challenge - Round 2 Training Dataset

    • data.nist.gov
    • nist.gov
    • +1more
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Paul Majurski (2020). Trojan Detection Software Challenge - Round 2 Training Dataset [Dataset]. http://doi.org/10.18434/M32285
    Explore at:
    Dataset updated
    Aug 5, 2020
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Michael Paul Majurski
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    The data being generated and disseminated is the training data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform image classification. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 1104 trained, human level, image classification AI models using a variety of model architectures. The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.

  18. Z

    TrackML Throughput Phase

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated May 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salzburger, Andreas; Innocente, Vincenzo; vlimant, jean-roch; rousseau, david; Gligorov, Vladimir; Basara, Laurent; Estrade, Victor; Calafiura, Paolo; Farell, Steven; Gray, Heather; Golling, Tobias; Kiehn, Moritz; Amrouche, Sabrina; Hushchyn, Mikhail; Ustyuzhanin, Andrey; Moyse, Edward; Germain, Cecile; Guyon, Isabelle (2021). TrackML Throughput Phase [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4730156
    Explore at:
    Dataset updated
    May 10, 2021
    Dataset provided by
    University of Geneva
    University of Massachusetts
    CERN
    School of Data Analysis
    California Institute of Technology
    CNRS
    LBNL
    INRIA
    Authors
    Salzburger, Andreas; Innocente, Vincenzo; vlimant, jean-roch; rousseau, david; Gligorov, Vladimir; Basara, Laurent; Estrade, Victor; Calafiura, Paolo; Farell, Steven; Gray, Heather; Golling, Tobias; Kiehn, Moritz; Amrouche, Sabrina; Hushchyn, Mikhail; Ustyuzhanin, Andrey; Moyse, Edward; Germain, Cecile; Guyon, Isabelle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Original source from Codalab : https://competitions.codalab.org/competitions/20112

    The dataset comprises multiple independent events, where each event contains simulated measurements (essentially 3D points) of particles generated in a collision between proton bunches at the Large Hadron Collider at CERN. The goal of the tracking machine learning challenge is to group the recorded measurements or hits for each event into tracks, sets of hits that belong to the same initial particle. A solution must uniquely associate each hit to one track. The training dataset contains the recorded hits, their ground truth counterpart and their association to particles, and the initial parameters of those particles. The test dataset contains only the recorded hits.

    Once unzipped, the dataset is provided as a set of plain .csv files. Each event has four associated files that contain hits, hit cells, particles, and the ground truth association between them. The common prefix, e.g. event000000010, is always event followed by 9 digits.

    event000000000-hits.csv

    event000000000-cells.csv

    event000000000-particles.csv

    event000000000-truth.csv

    event000000001-hits.csv

    event000000001-cells.csv

    event000000001-particles.csv

    event000000001-truth.csv

    Event hits

    The hits file contains the following values for each hit/entry:

    hit_id: numerical identifier of the hit inside the event.

    x, y, z: measured x, y, z position (in millimeter) of the hit in global coordinates.

    volume_id: numerical identifier of the detector group.

    layer_id: numerical identifier of the detector layer inside the group.

    module_id: numerical identifier of the detector module inside the layer.

    The volume/layer/module id could in principle be deduced from x, y, z. They are given here to simplify detector-specific data handling.

    Event truth

    The truth file contains the mapping between hits and generating particles and the true particle state at each measured hit. Each entry maps one hit to one particle.

    hit_id: numerical identifier of the hit as defined in the hits file.

    particle_id: numerical identifier of the generating particle as defined in the particles file. A value of 0 means that the hit did not originate from a reconstructible particle, but e.g. from detector noise.

    tx, ty, tz true intersection point in global coordinates (in millimeters) between the particle trajectory and the sensitive surface.

    tpx, tpy, tpz true particle momentum (in GeV/c) in the global coordinate system at the intersection point. The corresponding vector is tangent to the particle trajectory at the intersection point.

    weight per-hit weight used for the scoring metric; total sum of weights within one event equals to one.

    Event particles

    The particles files contains the following values for each particle/entry:

    particle_id: numerical identifier of the particle inside the event.

    vx, vy, vz: initial position or vertex (in millimeters) in global coordinates.

    px, py, pz: initial momentum (in GeV/c) along each global axis.

    q: particle charge (as multiple of the absolute electron charge).

    nhits: number of hits generated by this particle.

    All entries contain the generated information or ground truth.

    Event hit cells

    The cells file contains the constituent active detector cells that comprise each hit. The cells can be used to refine the hit to track association. A cell is the smallest granularity inside each detector module, much like a pixel on a screen, except that depending on the volume_id a cell can be a square or a long rectangle. It is identified by two channel identifiers that are unique within each detector module and encode the position, much like column/row numbers of a matrix. A cell can provide signal information that the detector module has recorded in addition to the position. Depending on the detector type only one of the channel identifiers is valid, e.g. for the strip detectors, and the value might have different resolution.

    hit_id: numerical identifier of the hit as defined in the hits file.

    ch0, ch1: channel identifier/coordinates unique within one module.

    value: signal value information, e.g. how much charge a particle has deposited.

    Additional detector geometry information

    The detector is built from silicon slabs (or modules, rectangular or trapezoĂŻdal), arranged in cylinders and disks, which measure the position (or hits) of the particles that cross them. The detector modules are organized into detector groups or volumes identified by a volume id. Inside a volume they are further grouped into layers identified by a layer id. Each layer can contain an arbitrary number of detector modules, the smallest geometrically distinct detector object, each identified by a module_id. Within each group, detector modules are of the same type have e.g. the same granularity. All simulated detector modules are so-called semiconductor sensors that are build from thin silicon sensor chips. Each module can be represented by a two-dimensional, planar, bounded sensitive surface. These sensitive surfaces are subdivided into regular grids that define the detectors cells, the smallest granularity within the detector.

    Each module has a different position and orientation described in the detectors file. A local, right-handed coordinate system is defined on each sensitive surface such that the first two coordinates u and v are on the sensitive surface and the third coordinate w is normal to the surface. The orientation and position are defined by the following transformation

    pos_xyz = rotation_matrix * pos_uvw + translation

    that transform a position described in local coordinates u,v,w into the equivalent position x,y,z in global coordinates using a rotation matrix and and translation vector (cx,cy,cz).

    volume_id: numerical identifier of the detector group.

    layer_id: numerical identifier of the detector layer inside the group.

    module_id: numerical identifier of the detector module inside the layer.

    cx, cy, cz: position of the local origin in the global coordinate system (in millimeter).

    rot_xu, rot_xv, rot_xw, rot_yu, ...: components of the rotation matrix to rotate from local u,v,w to global x,y,z coordinates.

    module_t: half thickness of the detector module (in millimeter).

    module_minhu, module_maxhu: the minimum/maximum half-length of the module boundary along the local u direction (in millimeter).

    module_hv: the half-length of the module boundary along the local v direction (in millimeter).

    pitch_u, pitch_v: the size of detector cells along the local u and v direction (in millimeter).

    There are two different module shapes in the detector, rectangular and trapezoidal. The pixel detector ( with volume_id = 7,8,9) is fully built from rectangular modules, and so are the cylindrical barrels in volume_id=13,17. The remaining layers are made out disks that need trapezoidal shapes to cover the full disk.

  19. R

    Machine Learning Problem For Practice 2 Dataset

    • universe.roboflow.com
    zip
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Nicole Rigay (2025). Machine Learning Problem For Practice 2 Dataset [Dataset]. https://universe.roboflow.com/andrea-nicole-rigay/machine-learning-problem-for-practice-2/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Andrea Nicole Rigay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects HPQz 4HI5 8gLn Bounding Boxes
    Description

    Machine Learning Problem FOR PRACTICE 2

    ## Overview
    
    Machine Learning Problem FOR PRACTICE 2 is a dataset for object detection tasks - it contains Objects HPQz 4HI5 8gLn annotations for 253 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. Predict the cost to ship the sculptures

    • kaggle.com
    zip
    Updated Mar 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayank Tripathi (2021). Predict the cost to ship the sculptures [Dataset]. https://www.kaggle.com/dskagglemt/predict-the-cost-to-ship-the-sculptures
    Explore at:
    zip(524491 bytes)Available download formats
    Dataset updated
    Mar 8, 2021
    Authors
    Mayank Tripathi
    Description

    This is an ML Challenge from HackerEarth. HackerEarth Machine Learning Challenge: Exhibit Art. https://www.hackerearth.com/challenges/competitive/hackerearth-machine-learning-challenge-predict-shipping-cost/

    Problem statement

    An art exhibitor is soon to launch an online portal for enthusiasts worldwide to start collecting art with only a click of a button. However, navigating the logistics of selling and distributing art does not seem to be a very straightforward task; such as acquiring art effectively and shipping these artifacts to their respective destinations post-purchase.

    Task

    The exhibitor has hired you as a Machine Learning Engineer for this project. You are required to build an advanced model that predicts the cost of shipping paintings, antiques, sculptures, and other collectibles to customers based on the information provided in the dataset.

    Dataset

    The dataset consists of parameters such as the artist’s name and reputation, dimensions, material, and price of the collectible, shipping details such as the customer information, scheduled dispatch, delivery dates, and so on.

    The benefits of practicing this problem by using Machine Learning techniques are as follows:

    This challenge encourages you to apply your Machine Learning skills to build a model that predicts a sculpture's shipping price with given parameter values. This challenge will help you enhance your knowledge of regression. Regression is one of the basic building blocks of Machine Learning.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sreenanda Sai Dasari (2021). 2018 Kaggle Machine Learning Challenge dataset [Dataset]. https://www.kaggle.com/datasets/sreenandasaidasari/2021-kaggle-machine-learning-challenge
Organization logo

2018 Kaggle Machine Learning Challenge dataset

2018 Kaggle Machine Learning Challenge dataset

Explore at:
zip(4127154 bytes)Available download formats
Dataset updated
Nov 28, 2021
Authors
Sreenanda Sai Dasari
Description

Dataset

This dataset was created by Sreenanda Sai Dasari

Contents

Search
Clear search
Close search
Google apps
Main menu