2 datasets found
  1. h

    hate_speech_dataset

    • huggingface.co
    Updated Jul 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Christodoulou (2024). hate_speech_dataset [Dataset]. https://huggingface.co/datasets/christinacdl/hate_speech_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2024
    Authors
    Christina Christodoulou
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    32.579 texts in total, 14.012 NOT hateful texts and 18.567 HATEFUL texts All duplicate values were removed Split using sklearn into 80% train and 20% temporary test (stratified label). Then split the test set using 0.50% test and validation (stratified label) Split: 80/10/10 Train set label distribution: 0 ==> 11.210, 1 ==> 14.853, 26.063 in total Validation set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in total Test set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in… See the full description on the dataset page: https://huggingface.co/datasets/christinacdl/hate_speech_dataset.

  2. Classifier Model

    • kaggle.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeriann L Rhymer (2025). Classifier Model [Dataset]. https://www.kaggle.com/datasets/jeriannlrhymer/regression-model
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jeriann L Rhymer
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Purpose of this data is Linear Regression

    Handling categorical features in a scikit-learn model. Carrying out a train/test split. Training a model. Evaluating that model on the testing data.

    The mpg data set represents the fuel economy (in miles per gallon) for 38 popular models of car, measured between 1999 and 2008.

    Factor Type Description manufacturer multi-valued discrete Vehicle manufacturer model multi-valued discrete Model of the vehicle displ continuous Size of engine [litres] year multi-valued discrete Year of vehicle manufacture cyl multi-valued discrete Number of ignition cylinders trans multi-valued discrete Transmission type (manual or automatic) drv multi-valued discrete Driven wheels (f=front, 4=4-wheel, r=rear wheel drive) city continuous Miles per gallon, city driving conditions (fuel economy) hwy continuous Miles per gallon, highway driving conditions (fuel economy) fl multi-valued discrete Vehicle type class multi-valued discrete Vehicle class (suv, compact, etc)

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christina Christodoulou (2024). hate_speech_dataset [Dataset]. https://huggingface.co/datasets/christinacdl/hate_speech_dataset

hate_speech_dataset

christinacdl/hate_speech_dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2024
Authors
Christina Christodoulou
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

32.579 texts in total, 14.012 NOT hateful texts and 18.567 HATEFUL texts All duplicate values were removed Split using sklearn into 80% train and 20% temporary test (stratified label). Then split the test set using 0.50% test and validation (stratified label) Split: 80/10/10 Train set label distribution: 0 ==> 11.210, 1 ==> 14.853, 26.063 in total Validation set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in total Test set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in… See the full description on the dataset page: https://huggingface.co/datasets/christinacdl/hate_speech_dataset.

Search
Clear search
Close search
Google apps
Main menu