5 datasets found
  1. P

    Jigsaw Toxic Comment Classification Dataset Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jigsaw Toxic Comment Classification Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/toxic-comment-classification-challenge
    Explore at:
    Description

    You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

    toxic severe_toxic obscene threat insult identity_hate You must create a model which predicts a probability of each type of toxicity for each comment.

    File descriptions train.csv - the training set, contains comments with their binary labels test.csv - the test set, you must predict the toxicity probabilities for these comments. To deter hand labeling, the test set contains some comments which are not included in scoring. sample_submission.csv - a sample submission file in the correct format test_labels.csv - labels for the test data; value of -1 indicates it was not used for scoring; (Note: file added after competition close!) Usage The dataset under CC0, with the underlying comment text being governed by Wikipedia's CC-SA-3.0

  2. h

    jigsaw-toxic-comments

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anitamaxvim, jigsaw-toxic-comments [Dataset]. https://huggingface.co/datasets/anitamaxvim/jigsaw-toxic-comments
    Explore at:
    Authors
    anitamaxvim
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Jigsaw Toxic Comments

      Dataset
    
    
    
    
    
      Dataset Description
    

    The Jigsaw Toxic Comments dataset is a benchmark dataset created for the Toxic Comment Classification Challenge on Kaggle. It is designed to help develop machine learning models that can identify and classify toxic online comments across multiple categories of toxicity.

    Curated by: Jigsaw (a technology incubator within Alphabet Inc.) Shared by: Kaggle Language(s) (NLP): English License: CC0 1.0… See the full description on the dataset page: https://huggingface.co/datasets/anitamaxvim/jigsaw-toxic-comments.

  3. Toxic Comment Classification Challenge

    • kaggle.com
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenghong Hu (2022). Toxic Comment Classification Challenge [Dataset]. https://www.kaggle.com/datasets/chenghonghu/toxic-comment-classification-challenge/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 3, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chenghong Hu
    Description

    Dataset

    This dataset was created by Chenghong Hu

    Contents

  4. jigsaw-toxic-comment-classification-challenges

    • kaggle.com
    zip
    Updated May 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Jain (2022). jigsaw-toxic-comment-classification-challenges [Dataset]. https://www.kaggle.com/datasets/rahul247jain/jigsawtoxiccommentclassificationchallenges/discussion?sort=undefined
    Explore at:
    zip(55956177 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Rahul Jain
    Description

    Dataset

    This dataset was created by Rahul Jain

    Contents

  5. Cyberbullying Dataset

    • kaggle.com
    Updated Oct 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2022). Cyberbullying Dataset [Dataset]. https://www.kaggle.com/datasets/saurabhshahane/cyberbullying-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saurabh Shahane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    This dataset is a collection of datasets from different sources related to the automatic detection of cyber-bullying. The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

    Content

    The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

    Acknowledgements

    Elsafoury, Fatma (2020), “Cyberbullying datasets”, Mendeley Data, V1, doi: 10.17632/jf4pzyvnpj.1

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jigsaw Toxic Comment Classification Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/toxic-comment-classification-challenge

Jigsaw Toxic Comment Classification Dataset Dataset

Explore at:
23 scholarly articles cite this dataset (View in Google Scholar)
Description

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

toxic severe_toxic obscene threat insult identity_hate You must create a model which predicts a probability of each type of toxicity for each comment.

File descriptions train.csv - the training set, contains comments with their binary labels test.csv - the test set, you must predict the toxicity probabilities for these comments. To deter hand labeling, the test set contains some comments which are not included in scoring. sample_submission.csv - a sample submission file in the correct format test_labels.csv - labels for the test data; value of -1 indicates it was not used for scoring; (Note: file added after competition close!) Usage The dataset under CC0, with the underlying comment text being governed by Wikipedia's CC-SA-3.0

Search
Clear search
Close search
Google apps
Main menu