100+ datasets found
  1. LLM: 7 prompt training dataset

    • kaggle.com
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl McBride Ellis (2023). LLM: 7 prompt training dataset [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/llm-7-prompt-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carl McBride Ellis
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description
    • Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
      File: train_essays_RDizzl3_seven_v2.csv
      Human texts: 14247 LLM texts: 3004

      See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts



    • Version 3: "**The RDizzl3 Seven**"
      File: train_essays_RDizzl3_seven_v1.csv

    • "Car-free cities"

    • "Does the electoral college work?"

    • "Exploring Venus"

    • "The Face on Mars"

    • "Facial action coding system"

    • "A Cowboy Who Rode the Waves"

    • "Driverless cars"

    How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

    • Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

    Namely:

    • "Car-free cities"
    • "Does the electoral college work?"
    • "Exploring Venus"
    • "The Face on Mars"
    • "Facial action coding system"
    • "Seeking multiple opinions"
    • "Phones and driving"

    This dataset is a derivative of the datasets

    as well as the original competition training dataset

    • Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
  2. A

    ‘Kaggle Competitions Top 100’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Kaggle Competitions Top 100’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-kaggle-competitions-top-100-961d/latest
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Kaggle Competitions Top 100’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vivovinco/kaggle-competitions-top-100 on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains top 100 of Kaggle competitions ranking. The dataset will be updated every month.

    Content

    100 rows and 13 columns. Columns' description are listed below.

    • User : Name of the user
    • Tier : Grandmaster, Master or Expert
    • Company/School : Company/School info of the user if mentioned
    • Country : Country info of the user if mentioned
    • Competitions_Num : Number of competitions joined
    • Competitions_Gold : Number of competitions gold medals won
    • Competitions_Silver : Number of competitions silver medals won
    • Competitions_Bronze : Number of competitions bronze medals won
    • Datasets_Num : Number of public datasets
    • Notebooks_Num : Number of public notebooks
    • Discussions_Num : Number of topics/comments posted
    • Points : Total points
    • Profile : Link of Kaggle profile

    Acknowledgements

    Data from Kaggle. Image from Smartcat.

    If you're reading this, please upvote.

    --- Original source retains full ownership of the source dataset ---

  3. Agentic_AI_Applications_2025

    • kaggle.com
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hajra Amir (2025). Agentic_AI_Applications_2025 [Dataset]. https://www.kaggle.com/datasets/hajraamir21/agentic-ai-applications-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hajra Amir
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a comprehensive overview of various Agentic AI (autonomous AI) applications across multiple industries in 2025. It contains detailed records of how AI is being utilized to automate complex tasks, improve efficiency, and generate measurable outcomes. The dataset is designed to help researchers, data scientists, and businesses understand the current state and potential of Agentic AI in different sectors. Dataset Features: Industry: The sector where Agentic AI is applied (e.g., Healthcare, Finance, Manufacturing).

    Application Area: The specific task or function performed by the AI agent (e.g., Fraud Detection, Predictive Maintenance).

    AI Agent Name: The name of the AI system or agent deployed (e.g., HealthAI Monitor, FinSecure Agent).

    Task Description: A brief description of the AI's function or role.

    Technology Stack: The technologies powering the AI (e.g., Machine Learning, NLP, Computer Vision).

    Outcome Metrics:The measurable impact of the AI deployment (e.g., 30% reduction in ER visits).

    Deployment Year: The year the AI system was deployed (ranging from 2023 to 2025).

    Geographical Region: The region where the AI application is implemented (e.g., North America, Asia, Europe).

  4. scnu-ai-challenge-dataset-with-pred_support_facts

    • kaggle.com
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    czy111 (2024). scnu-ai-challenge-dataset-with-pred_support_facts [Dataset]. https://www.kaggle.com/datasets/czy111/scnu-ai-challenge-dataset-with-pred-support-facts/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    czy111
    Description

    Dataset

    This dataset was created by czy111

    Contents

  5. A

    ‘Kaggle Competitions Ranking’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Kaggle Competitions Ranking’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-kaggle-competitions-ranking-f15f/7682e95e/?iid=003-169&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Kaggle Competitions Ranking’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vivovinco/kaggle-competitions-ranking on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains Kaggle ranking of competitions.

    Content

    5000 rows and 8 columns. Columns' description are listed below.

    • Rank : Rank of the user
    • Tier : Grandmaster, Master or Expert
    • Username : Name of the user
    • Join Date : Year of join
    • Gold Medals : Number of gold medals
    • Silver Medals : Number of silver medals
    • Bronze Medals : Number of bronze medals
    • Points : Total points

    Acknowledgements

    Data from Kaggle. Image from Olympics.

    If you're reading this, please upvote.

    --- Original source retains full ownership of the source dataset ---

  6. A

    ‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-top-1000-kaggle-datasets-658b/b992f64b/?iid=004-457&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Top 1000 Kaggle Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/notkrishna/top-1000-kaggle-datasets on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    From wiki

    Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

    Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

    Source: Kaggle

    --- Original source retains full ownership of the source dataset ---

  7. LLM - Detect AI Generated Text Dataset

    • kaggle.com
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sunil thite (2023). LLM - Detect AI Generated Text Dataset [Dataset]. https://www.kaggle.com/datasets/sunilthite/llm-detect-ai-generated-text-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sunil thite
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In this Dataset contains both AI Generated Essay and Human Written Essay for Training Purpose This dataset challenge is to to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM. The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.

    Dataset contains more than 28,000 essay written by student and AI generated.

    Features : 1. text : Which contains essay text 2. generated : This is target label . 0 - Human Written Essay , 1 - AI Generated Essay

  8. A

    AI Training Dataset Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). AI Training Dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-dataset-1501897
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the urgent need for high-quality data to train sophisticated AI models capable of handling complex tasks. Key application areas, such as autonomous vehicles in the automotive industry, advanced medical diagnosis in healthcare, and personalized experiences in retail and e-commerce, are significantly contributing to this market's upward trajectory. The prevalence of text, image/video, and audio data types further diversifies the market, offering opportunities for specialized dataset providers. While the market faces challenges like data privacy concerns and the high cost of data annotation, the overall trajectory remains positive, with a projected Compound Annual Growth Rate (CAGR) exceeding 20% for the forecast period (2025-2033). This growth is further supported by advancements in deep learning techniques that demand increasingly larger and more diverse datasets for optimal performance. Leading companies like Google, Amazon, and Microsoft are actively investing in this space, expanding their dataset offerings and fostering competition within the market. Furthermore, the emergence of specialized data annotation providers caters to the specific needs of various industries, ensuring accurate and reliable data for AI model development. The geographic distribution of the market reveals strong presence in North America and Europe, driven by early adoption of AI technologies and the presence of major technology players. However, Asia Pacific is projected to witness significant growth in the coming years, propelled by increasing digitalization and a burgeoning AI ecosystem in countries like China and India. Government initiatives promoting AI development in various regions are also expected to stimulate demand for high-quality training datasets. While challenges related to data security and ethical considerations remain, the long-term outlook for the AI training dataset market is exceptionally promising, fueled by the continued evolution of artificial intelligence and its increasing integration into various aspects of modern life. The market segmentation by application and data type allows for granular analysis and targeted investments for businesses operating in this rapidly expanding sector.

  9. mlcourse.ai - Dota 2 - winner prediction Dataset

    • kaggle.com
    zip
    Updated Sep 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sushma Biswas (2019). mlcourse.ai - Dota 2 - winner prediction Dataset [Dataset]. https://www.kaggle.com/datasets/sushmabiswas/mlcourseai-dota-2-winner-prediction-dataset
    Explore at:
    zip(759868828 bytes)Available download formats
    Dataset updated
    Sep 8, 2019
    Authors
    Sushma Biswas
    Description

    Context

    Hello! I am currently taking the mlcourse.ai course and as part of one of it's in-class Kaggle competitions, this dataset was required. The data is originally hosted on git but I like to have my data right here on Kaggle. That's why this dataset.

    If you find this dataset useful, do upvote. Thank you and happy learning!

    Content

    This dataset contains 6 files in total. 1. Sample_submission.csv 2. Train_features.csv 3. Test_features.csv 4. Train_targets.csv 5. Train_matches.jsonl 6. Test_matches.jsonl

    Acknowledgements

    All of the data in this dataset is originally hosted on git and the same can also be found on the in-class competition's 'data' page here.

    Inspiration

    • to be updated.
  10. C

    Community-Driven Model Service Platform Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Community-Driven Model Service Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/community-driven-model-service-platform-507803
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The community-driven model service platform market is experiencing robust growth, projected to reach $35.14 billion in 2025 and expanding at a compound annual growth rate (CAGR) of 10.1% from 2025 to 2033. This surge is driven by several key factors. The increasing accessibility of machine learning models, fueled by platforms like Kaggle, GitHub, and Hugging Face, is lowering the barrier to entry for developers and researchers. The collaborative nature of these platforms fosters innovation and accelerates model development, leading to a wider adoption of AI solutions across various industries. Furthermore, the growing demand for specialized and customized AI models is pushing businesses to leverage community-driven platforms, where they can find pre-trained models or collaborate on developing tailored solutions, thereby reducing development time and costs. The trend towards open-source models and the rise of model zoos contribute significantly to this market expansion. While challenges exist, such as ensuring model quality, security, and addressing potential biases, the overall market trajectory remains strongly positive. The market's segmentation likely includes various model types (e.g., image recognition, natural language processing, time series analysis), deployment options (cloud-based, on-premise), and target industries (healthcare, finance, retail). Leading players, such as Kaggle, GitHub, Hugging Face, TensorFlow Hub, Model Zoo, DrivenData, and Cortex, are actively shaping the market landscape through continuous innovation and community engagement. The geographical distribution of the market is likely to reflect the global concentration of AI expertise and technological infrastructure, with regions like North America and Europe holding significant market shares initially, followed by rapid expansion in Asia and other developing regions as digital infrastructure improves. Future growth will hinge on continued technological advancements, further integration with cloud platforms, and the development of robust governance frameworks to address ethical concerns surrounding AI model development and deployment.

  11. A

    ‘HR data, Predict changing jobs (competition form)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘HR data, Predict changing jobs (competition form)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hr-data-predict-changing-jobs-competition-form-1d9b/a230c863/?iid=013-955&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘HR data, Predict changing jobs (competition form)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kukuroo3/hr-data-predict-change-jobscompetition-form on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context This dataset was taken from link and separated into competition format. The label for the test data is provided in the form of a function.

    --- Original source retains full ownership of the source dataset ---

  12. A

    ‘Covid-19 Prevent secondary transmission’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Covid-19 Prevent secondary transmission’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-prevent-secondary-transmission-f6b3/14be25d0/?iid=001-812&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Covid-19 Prevent secondary transmission’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mpwolke/cusersmarildownloadssecondcsv on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Covid-19 data collected in the CORD 19 Challenge https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge?select=metadata.csv

    Content

    Studies subject: Secondary Transmission of Covid-19

    Acknowledgements

    Allen Institute for AI: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge?select=metadata.csv

    David Mezzetti: https://www.kaggle.com/davidmezzetti/cord-19-task-csv-exports/data

    Inspiration

    Covid-19 Pandemic.

    --- Original source retains full ownership of the source dataset ---

  13. h

    olympiad-math-contest-llama3-20k

    • huggingface.co
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Amiri (2024). olympiad-math-contest-llama3-20k [Dataset]. https://huggingface.co/datasets/kevin009/olympiad-math-contest-llama3-20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 1, 2024
    Authors
    Kevin Amiri
    Description

    AMC/AIME Mathematics Problem and Solution Dataset

      Dataset Details
    

    Dataset Name: AMC/AIME Mathematics Problem and Solution Dataset Version: 1.0 Release Date: 2024-06-1 Authors: Kevin Amiri

      Intended Use
    

    Primary Use: The dataset is created and intended for research and an AI Mathematical Olympiad Kaggle competition. Intended Users: Researchers in AI & mathematics or science.

      Dataset Composition
    

    Number of Examples: 20,300 problems and solution sets… See the full description on the dataset page: https://huggingface.co/datasets/kevin009/olympiad-math-contest-llama3-20k.

  14. A

    Align Key Points Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Align Key Points Report [Dataset]. https://www.datainsightsmarket.com/reports/align-key-points-531365
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global market for [Insert Market Name Here – e.g., AI-powered computer vision] is experiencing robust growth, projected to reach $[Estimate Market Size in 2025, e.g., 15 Billion] in value by 2025. A Compound Annual Growth Rate (CAGR) of [Estimate CAGR, e.g., 25%] from 2025 to 2033 indicates a substantial expansion to an estimated $[Estimate Market Size in 2033, e.g., 75 Billion] by the end of the forecast period. Key drivers include the increasing adoption of AI across diverse industries like automotive, healthcare, and security, fueled by advancements in deep learning and improved data processing capabilities. Emerging trends, such as the rise of edge computing and the development of more sophisticated image recognition algorithms, are further propelling market expansion. However, challenges remain. High implementation costs associated with AI technologies and the need for substantial data sets for effective model training could hinder widespread adoption. Furthermore, concerns around data privacy and security, particularly regarding the ethical implications of facial recognition technologies, represent significant restraints. Market segmentation reveals a strong presence of players like ULUCU, Roboflow, Oosto, MathWorks, GitHub, Qualcomm Developer Network, Coursera, IFSEC Insider, Kaggle, and Thales, indicating a competitive landscape. These companies cater to different segments based on their offerings and target applications, contributing to the diverse growth patterns observed across the market. Regional analysis (data assumed to be available but unspecified in the prompt; regional distributions will vary but a logical breakdown needs to be presented) would reveal varied growth trajectories depending upon technological adoption rates and regulatory landscapes.

  15. R

    Car Damages Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Proyect (2025). Car Damages Kaggle Dataset [Dataset]. https://universe.roboflow.com/ai-proyect/car-damages-kaggle
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 16, 2025
    Dataset authored and provided by
    AI Proyect
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Car Damages Polygons
    Description

    Car Damages Kaggle

    ## Overview
    
    Car Damages Kaggle is a dataset for instance segmentation tasks - it contains Car Damages annotations for 814 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. Explainable AI (XAI) Drilling Dataset

    • kaggle.com
    Updated Aug 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Raphael Wallsberger
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

    This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

    • ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.
    • Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.
    • Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.
    • Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.
    • Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.
    • Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.
    • Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).
    • Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.
    • Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].
    • Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.

    • Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

    Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.

  17. A

    ‘Gufhtugu Publications Dataset Challenge’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Gufhtugu Publications Dataset Challenge’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-gufhtugu-publications-dataset-challenge-0764/0bd8674f/?iid=006-565&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Gufhtugu Publications Dataset Challenge’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/zusmani/gufhtugu-publications-dataset-challenge on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This is the one of its kinds book sales dataset from Pakistan. It contains 20,000 book orders from January 2019 to January 2021. The data was collected from the merchant (Gufhtugu Publications www.Gufhtugu.com) who are partner in this research study. There is a dire need for such dataset to learn about Pakistan’s emerging e-commerce potential and I hope this will help many startups in many ways.

    Content

    Geography: Pakistan

    Time period: 01/2019 – 01/2021

    Unit of analysis: E-Commerce Orders

    Dataset: The dataset contains detailed information of 200,000 online book orders in Pakistan from January 2019 to January 2021. It contains order number, order status (completed, cancelled, returned), order date and time, book name and city address. This is the most detailed dataset about e-commerce orders in Pakistan that you can find in the Public domain.

    Variables: The dataset contains order number, order status, book name, order date, order time and city of the customer.

    Size: 1.5 MB

    File Type: CSV

    Acknowledgements

    I like to thank all the startups who are trying to make their mark in Pakistan despite the unavailability of research data. Thanks to Gufhtugu Publications (www.Gufhtugu.com) for allowing me to run this challenge.

    Inspiration

    I’d like to call the attention of my fellow Kagglers to use Machine Learning and Data Sciences to help me explore these ideas:

    • What is the best-selling book? • Visualize order status frequency • Find a correlation between date and time with order status • Find a correlation between city and order status • Find any hidden patterns that are counter-intuitive for a layman • Can we predict number of orders, or book names in advance?

    --- Original source retains full ownership of the source dataset ---

  18. h

    arena-human-preference-55k

    • huggingface.co
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2025). arena-human-preference-55k [Dataset]. https://huggingface.co/datasets/lmarena-ai/arena-human-preference-55k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2025
    Dataset authored and provided by
    LMArena
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset for Kaggle competition on predicting human preference on Chatbot Arena battles. The training dataset includes over 55,000 real-world user and LLM conversations and user preferences across over 70 state-of-the-art LLMs, such as GPT-4, Claude 2, Llama 2, Gemini, and Mistral models. Each sample represents a battle consisting of 2 LLMs which answer the same question, with a user label of either prefer model A, prefer model B, tie, or tie (both bad).

      Citation
    

    Please cite the… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/arena-human-preference-55k.

  19. A

    ‘StockX Sneaker Data Contest’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘StockX Sneaker Data Contest’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-stockx-sneaker-data-contest-ae17/5fc3e134/?iid=010-160&v=presentation
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘StockX Sneaker Data Contest’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hudsonstuck/stockx-data-contest on 29 August 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset is from the StockX 2019 Data Contest.

    Content

    Currently the dataset consists of the single file of sales provided by StockX. ~10000 shoe sales from 50 different models (Nike x Off-White and Yeezy).

    In the coming weeks more data will be added, including the estimated number of pairs released for each model and other information that might be useful for making predictions. Additionally, some of the data types will be modified to make numerical analysis easier.

    Inspiration

    • What shoes are most popular?
    • Which shoes have the best/worst profit margins?
    • What factors affect profit margin?
    • Is it possible to predict the sale price of a shoe at a given time? (i.e. when should I sell?)

    --- Original source retains full ownership of the source dataset ---

  20. AI vs. Human-Generated Images

    • kaggle.com
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessandra Sala (2025). AI vs. Human-Generated Images [Dataset]. https://www.kaggle.com/datasets/alessandrasala79/ai-vs-human-generated-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alessandra Sala
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Official dataset for the 2025 Women in AI Kaggle Competition: https://www.kaggle.com/competitions/detect-ai-vs-human-generated-images

    The dataset consists of authentic images sampled from the Shutterstock platform across various categories, including a balanced selection where one-third of the images feature humans. These authentic images are paired with their equivalents generated using state-of-the-art generative models. This structured pairing enables a direct comparison between real and AI-generated content, providing a robust foundation for developing and evaluating image authenticity detection systems.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Carl McBride Ellis (2023). LLM: 7 prompt training dataset [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/llm-7-prompt-training-dataset
Organization logo

LLM: 7 prompt training dataset

(for use in the LLM - Detect AI Generated Text competition)

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carl McBride Ellis
License

https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

Description
  • Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
    File: train_essays_RDizzl3_seven_v2.csv
    Human texts: 14247 LLM texts: 3004

    See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts



  • Version 3: "**The RDizzl3 Seven**"
    File: train_essays_RDizzl3_seven_v1.csv

  • "Car-free cities"

  • "Does the electoral college work?"

  • "Exploring Venus"

  • "The Face on Mars"

  • "Facial action coding system"

  • "A Cowboy Who Rode the Waves"

  • "Driverless cars"

How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

  • Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

Namely:

  • "Car-free cities"
  • "Does the electoral college work?"
  • "Exploring Venus"
  • "The Face on Mars"
  • "Facial action coding system"
  • "Seeking multiple opinions"
  • "Phones and driving"

This dataset is a derivative of the datasets

as well as the original competition training dataset

  • Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
Search
Clear search
Close search
Google apps
Main menu