100+ datasets found
  1. AI Vs Human Text

    • kaggle.com
    Updated Jan 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shayan Gerami (2024). AI Vs Human Text [Dataset]. https://www.kaggle.com/datasets/shanegerami/ai-vs-human-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shayan Gerami
    Description

    Around 500K essays are available in this dataset, both created by AI and written by Human.

    I have gathered the data from multiple sources, added them together and removed the duplicates

  2. LLM - Detect AI Generated Text Dataset

    • kaggle.com
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sunil thite (2023). LLM - Detect AI Generated Text Dataset [Dataset]. https://www.kaggle.com/datasets/sunilthite/llm-detect-ai-generated-text-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sunil thite
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In this Dataset contains both AI Generated Essay and Human Written Essay for Training Purpose This dataset challenge is to to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM. The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.

    Dataset contains more than 28,000 essay written by student and AI generated.

    Features : 1. text : Which contains essay text 2. generated : This is target label . 0 - Human Written Essay , 1 - AI Generated Essay

  3. LLM: 7 prompt training dataset

    • kaggle.com
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl McBride Ellis (2023). LLM: 7 prompt training dataset [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/llm-7-prompt-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carl McBride Ellis
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description
    • Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
      File: train_essays_RDizzl3_seven_v2.csv
      Human texts: 14247 LLM texts: 3004

      See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts



    • Version 3: "**The RDizzl3 Seven**"
      File: train_essays_RDizzl3_seven_v1.csv

    • "Car-free cities"

    • "Does the electoral college work?"

    • "Exploring Venus"

    • "The Face on Mars"

    • "Facial action coding system"

    • "A Cowboy Who Rode the Waves"

    • "Driverless cars"

    How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

    • Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

    Namely:

    • "Car-free cities"
    • "Does the electoral college work?"
    • "Exploring Venus"
    • "The Face on Mars"
    • "Facial action coding system"
    • "Seeking multiple opinions"
    • "Phones and driving"

    This dataset is a derivative of the datasets

    as well as the original competition training dataset

    • Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
  4. A

    ‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-top-1000-kaggle-datasets-658b/b992f64b/?iid=004-457&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Top 1000 Kaggle Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/notkrishna/top-1000-kaggle-datasets on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    From wiki

    Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

    Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

    Source: Kaggle

    --- Original source retains full ownership of the source dataset ---

  5. AI medical chatbot

    • kaggle.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yousef Saeedian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description:

    This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

    Key Features:

    • Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.
    • Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.
    • Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.
    • Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

    Potential Use Cases:

    • NLP Model Training: Train models to understand and generate medical dialogues.
    • Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.
    • Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.
    • Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

    This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.

  6. A

    ‘Kaggle Datasets Ranking’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Kaggle Datasets Ranking’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-kaggle-datasets-ranking-2744/64eafea2/?iid=003-771&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Kaggle Datasets Ranking’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vivovinco/kaggle-datasets-ranking on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains Kaggle ranking of datasets.

    Content

    +800 rows and 8 columns. Columns' description are listed below.

    • Rank : Rank of the user
    • Tier : Grandmaster, Master or Expert
    • Username : Name of the user
    • Join Date : Year of join
    • Gold Medals : Number of gold medals
    • Silver Medals : Number of silver medals
    • Bronze Medals : Number of bronze medals
    • Points : Total points

    Acknowledgements

    Data from Kaggle. Image from The Guardian.

    If you're reading this, please upvote.

    --- Original source retains full ownership of the source dataset ---

  7. R

    Car Damages Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Proyect (2025). Car Damages Kaggle Dataset [Dataset]. https://universe.roboflow.com/ai-proyect/car-damages-kaggle
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 16, 2025
    Dataset authored and provided by
    AI Proyect
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Car Damages Polygons
    Description

    Car Damages Kaggle

    ## Overview
    
    Car Damages Kaggle is a dataset for instance segmentation tasks - it contains Car Damages annotations for 814 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. A

    ‘Kaggle Competitions Top 100’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Kaggle Competitions Top 100’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-kaggle-competitions-top-100-961d/latest
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Kaggle Competitions Top 100’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vivovinco/kaggle-competitions-top-100 on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains top 100 of Kaggle competitions ranking. The dataset will be updated every month.

    Content

    100 rows and 13 columns. Columns' description are listed below.

    • User : Name of the user
    • Tier : Grandmaster, Master or Expert
    • Company/School : Company/School info of the user if mentioned
    • Country : Country info of the user if mentioned
    • Competitions_Num : Number of competitions joined
    • Competitions_Gold : Number of competitions gold medals won
    • Competitions_Silver : Number of competitions silver medals won
    • Competitions_Bronze : Number of competitions bronze medals won
    • Datasets_Num : Number of public datasets
    • Notebooks_Num : Number of public notebooks
    • Discussions_Num : Number of topics/comments posted
    • Points : Total points
    • Profile : Link of Kaggle profile

    Acknowledgements

    Data from Kaggle. Image from Smartcat.

    If you're reading this, please upvote.

    --- Original source retains full ownership of the source dataset ---

  9. Agentic_AI_Applications_2025

    • kaggle.com
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hajra Amir (2025). Agentic_AI_Applications_2025 [Dataset]. https://www.kaggle.com/datasets/hajraamir21/agentic-ai-applications-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hajra Amir
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a comprehensive overview of various Agentic AI (autonomous AI) applications across multiple industries in 2025. It contains detailed records of how AI is being utilized to automate complex tasks, improve efficiency, and generate measurable outcomes. The dataset is designed to help researchers, data scientists, and businesses understand the current state and potential of Agentic AI in different sectors. Dataset Features: Industry: The sector where Agentic AI is applied (e.g., Healthcare, Finance, Manufacturing).

    Application Area: The specific task or function performed by the AI agent (e.g., Fraud Detection, Predictive Maintenance).

    AI Agent Name: The name of the AI system or agent deployed (e.g., HealthAI Monitor, FinSecure Agent).

    Task Description: A brief description of the AI's function or role.

    Technology Stack: The technologies powering the AI (e.g., Machine Learning, NLP, Computer Vision).

    Outcome Metrics:The measurable impact of the AI deployment (e.g., 30% reduction in ER visits).

    Deployment Year: The year the AI system was deployed (ranging from 2023 to 2025).

    Geographical Region: The region where the AI application is implemented (e.g., North America, Asia, Europe).

  10. Explainable AI (XAI) Drilling Dataset

    • kaggle.com
    Updated Aug 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Raphael Wallsberger
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

    This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

    • ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.
    • Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.
    • Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.
    • Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.
    • Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.
    • Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.
    • Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).
    • Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.
    • Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].
    • Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.

    • Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

    Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.

  11. A

    ‘List of kaggle Grandmasters’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘List of kaggle Grandmasters’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-kaggle-grandmasters-f09a/9d7e1e37/?iid=003-852&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘List of kaggle Grandmasters’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/carlmcbrideellis/list-of-kaggle-grandmasters on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Description

    This is a list of kaggle Grandmasters and their individual tiers and countries.

    Note: The dataset does not include Grandmasters who have gone on to become kaggle staff.

    --- Original source retains full ownership of the source dataset ---

  12. R

    Custom Yolov7 On Kaggle On Custom Dataset

    • universe.roboflow.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 29, 2023
    Dataset authored and provided by
    Owais Ahmad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Person Car Bounding Boxes
    Description

    Custom Training with YOLOv7 🔥

    Some Important links

    Contact Information

    Objective

    To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

    Data Acquisition

    The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

    from IPython.display import Markdown, display
    
    display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))
    

    Custom Training with YOLOv7 🔥

    In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

    • Export the dataset to YOLOv7
    • Train YOLOv7 to recognize the objects in our dataset
    • Evaluate our YOLOv7 model's performance
    • Run test inference to view performance of YOLOv7 model at work

    📦 YOLOv7

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

    Image Credit - jinfagang

    Step 1: Install Requirements

    !git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements
    %cd yolov7
    !pip install -qr requirements.txt
    !pip install -q roboflow
    

    Downloading YOLOV7 starting checkpoint

    !wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"
    
    import os
    import glob
    import wandb
    import torch
    from roboflow import Roboflow
    from kaggle_secrets import UserSecretsClient
    from IPython.display import Image, clear_output, display # to display images
    
    
    
    print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")
    

    https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

    I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

    YOLOv7-Car-Person-Custom

    try:
      user_secrets = UserSecretsClient()
      wandb_api_key = user_secrets.get_secret("wandb_api")
      wandb.login(key=wandb_api_key)
      anonymous = None
    except:
      wandb.login(anonymous='must')
      print('To use your W&B account,
    Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. 
    Get your W&B access token from here: https://wandb.ai/authorize')
      
      
      
    wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")
    

    Step 2: Assemble Our Dataset

    https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

    In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

    In Roboflow, We can choose between two paths:

    Version v2 Aug 12, 2022 Looks like this.

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

    user_secrets = UserSecretsClient()
    roboflow_api_key = user_secrets.get_secret("roboflow_api")
    
    rf = Roboflow(api_key=roboflow_api_key)
    project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq")
    dataset = project.version(2).download("yolov7")
    

    Step 3: Training Custom pretrained YOLOv7 model

    Here, I am able to pass a number of arguments: - img: define input image size - batch: determine

  13. A

    ‘Deep Learning A-Z - ANN dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Deep Learning A-Z - ANN dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-deep-learning-a-z-ann-dataset-3c75/cb36262b/?iid=013-193&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Deep Learning A-Z - ANN dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/filippoo/deep-learning-az-ann on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This is the dataset used in the section "ANN (Artificial Neural Networks)" of the Udemy course from Kirill Eremenko (Data Scientist & Forex Systems Expert) and Hadelin de Ponteves (Data Scientist), called Deep Learning A-Z™: Hands-On Artificial Neural Networks. The dataset is very useful for beginners of Machine Learning, and a simple playground where to compare several techniques/skills.

    It can be freely downloaded here: https://www.superdatascience.com/deep-learning/

    The story: A bank is investigating a very high rate of customer leaving the bank. Here is a 10.000 records dataset to investigate and predict which of the customers are more likely to leave the bank soon.

    The story of the story: I'd like to compare several techniques (better if not alone, and with the experience of several Kaggle users) to improve my basic knowledge on Machine Learning.

    Content

    I will write more later, but the columns names are very self-explaining.

    Acknowledgements

    Udemy instructors Kirill Eremenko (Data Scientist & Forex Systems Expert) and Hadelin de Ponteves (Data Scientist), and their efforts to provide this dataset to their students.

    Inspiration

    Which methods score best with this dataset? Which are fastest (or, executable in a decent time)? Which are the basic steps with such a simple dataset, very useful to beginners?

    --- Original source retains full ownership of the source dataset ---

  14. A

    ‘Do You Know Where America Stands On Guns?’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Do You Know Where America Stands On Guns?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-do-you-know-where-america-stands-on-guns-1eca/ac6aae28/?iid=005-535&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Do You Know Where America Stands On Guns?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/poll-quiz-gunse on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    This folder contains the data behind the quiz Do You Know Where America Stands On Guns?

    guns-polls.csv contains the list of polls about guns that we used in our quiz. All polls have been taken after February 14, 2018, the date of the school shooting in Parkland, Florida.

    The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

    Source: https://github.com/fivethirtyeight/data

    This dataset was created by FiveThirtyEight and contains around 100 samples along with End, Republican Support, technical information and other features such as: - Start - Support - and more.

    How to use this dataset

    • Analyze Question in relation to Url
    • Study the influence of Population on Pollster
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  15. h

    AI-human-text

    • huggingface.co
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dinh Ngoc An (2024). AI-human-text [Dataset]. https://huggingface.co/datasets/andythetechnerd03/AI-human-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2024
    Authors
    Dinh Ngoc An
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a processed dataset of Human vs AI Text roughly 400k rows. This is taken from the Kaggle dataset https://www.kaggle.com/datasets/shanegerami/ai-vs-human-text/data then processed and split into training and test sets.

  16. A

    ‘ℹ National Park Locations’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘ℹ National Park Locations’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-national-park-locations-f7b3/58a76984/?iid=016-163&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘ℹ National Park Locations’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/national-park-locationse on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    A large data set containing the official URLs of United States national and state parks. Added: June 14, 2014 by CrowdFlower | Data Rows: 323 Download Now

    Source: https://www.crowdflower.com/data-for-everyone/

    This dataset was created by CrowdFlower and contains around 300 samples along with Google1 Correct Website Found:confidence, Info About Park On A List Website Gold, technical information and other features such as: - Canary - Google1 Correct Website Found - and more.

    How to use this dataset

    • Analyze Google1 Correct Website Worker Input:confidence in relation to Info About Park On A List Website
    • Study the influence of Info About Park On A List Website Worker Input:confidence on Google1 Correct Website Found Gold
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit CrowdFlower

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  17. A

    ‘📸 Most Followed on Instagram’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘📸 Most Followed on Instagram’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-most-followed-on-instagram-0687/b584c59d/?iid=000-706&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘📸 Most Followed on Instagram’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/most-followed-on-instagrame on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Data from Iconsquare.com. The company delivers Instagram analytics to tens of thousands of businesses, agencies and individuals.

    • BRAND
    • CATEGORIES 1
    • CATEGORIES 2
    • FOLLOWERS
    • ER - Average engagement rate of all media. Engagement rate is based on the likes and comments received divided by the number of followers at the time of the post.
    • POSTS ON HASHTAG
    • MEDIA POSTED

    Source: ICONSQUARE

    This dataset was created by Social Media Data and contains around 100 samples along with Categories 1, Categories 2, technical information and other features such as: - Followers - Categories 1 - and more.

    How to use this dataset

    • Analyze Categories 2 in relation to Followers
    • Study the influence of Categories 1 on Categories 2
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Social Media Data

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  18. A

    ‘control_data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘control_data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-control-data-596d/latest
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘control_data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tammyrotem/control-data on 30 September 2021.

    --- No further description of dataset provided by original source ---

    --- Original source retains full ownership of the source dataset ---

  19. 🌍 Impact of AI on Digital Media (2020-2025)

    • kaggle.com
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). 🌍 Impact of AI on Digital Media (2020-2025) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/impact-of-ai-on-digital-media-2020-2025/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atharva Soundankar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📊 About the Dataset

    This dataset explores the influence of AI-generated content across various industries, including journalism, social media, entertainment, and marketing. It provides insights into public sentiment, engagement trends, economic impact, and regulatory responses over time.

    With AI-generated content becoming increasingly prevalent, this dataset serves as a valuable resource for data analysts, business strategists, and machine learning researchers to study trends, detect biases, and predict future AI adoption patterns.

    💡 This dataset is perfect for AI adoption analysis, industry forecasting, and ethical AI research!

  20. A

    ‘GENIA Bio-medical event dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘GENIA Bio-medical event dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-genia-bio-medical-event-dataset-ecdb/latest
    Explore at:
    Dataset updated
    Dec 7, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘GENIA Bio-medical event dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/nishanthsalian/genia-biomedical-event-dataset on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Bio-medical texts have a lot of information which can be used for developments in the medical field. Traditionally, domain experts used to manually extract such information. Automating this information extraction task can help speed up progress in the field. To name a few use cases of bio-medical events, they show the effects of drugs on a person. They can also be used to identify certain medical conditions in a person. Hence automating extraction of events from bio-medical texts is very beneficial

    Content

    The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES

    It consists of the original bio-medical text, labelled trigger words, location of trigger word in the text and the event type associated with the trigger word There are 3 sets of data (train (8k+ sentences), devel (about 3k sentences) and test (about 3k sentences)). Each set has 4 columns namely "Sentence", "TriggerWord", "TriggerWordLoc" and "EventType", capturing the original bio-medical text, trigger words in the sentence, location of the trigger words in the sentence and the event type associated with the trigger words respectively.

    Acknowledgements

    The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES The original source dataset is from BioNLP Shared Task 2011 A complete unprocessed version seems to be present in genia-event-2011 dataset too

    For TEES licensing information please refer this link For GENIA dataset licensing information, please refer the file "GE11-LICENSE" present beside the data files (.csv) in this kaggle dataset

    Photo Credits: Louis Reed on Unsplash

    --- Original source retains full ownership of the source dataset ---

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shayan Gerami (2024). AI Vs Human Text [Dataset]. https://www.kaggle.com/datasets/shanegerami/ai-vs-human-text
Organization logo

AI Vs Human Text

500K AI and Human Generated Essays

Explore at:
9 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shayan Gerami
Description

Around 500K essays are available in this dataset, both created by AI and written by Human.

I have gathered the data from multiple sources, added them together and removed the duplicates

Search
Clear search
Close search
Google apps
Main menu