2 datasets found
  1. h

    hate_speech_dataset

    • huggingface.co
    Updated Jul 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Christodoulou (2024). hate_speech_dataset [Dataset]. https://huggingface.co/datasets/christinacdl/hate_speech_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2024
    Authors
    Christina Christodoulou
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    32.579 texts in total, 14.012 NOT hateful texts and 18.567 HATEFUL texts All duplicate values were removed Split using sklearn into 80% train and 20% temporary test (stratified label). Then split the test set using 0.50% test and validation (stratified label) Split: 80/10/10 Train set label distribution: 0 ==> 11.210, 1 ==> 14.853, 26.063 in total Validation set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in total Test set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in… See the full description on the dataset page: https://huggingface.co/datasets/christinacdl/hate_speech_dataset.

  2. m

    Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...

    • data.mendeley.com
    • narcis.nl
    Updated Jan 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TaeKeun Yoo (2021). Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics" [Dataset]. http://doi.org/10.17632/ffn745r57z.2
    Explore at:
    Dataset updated
    Jan 11, 2021
    Authors
    TaeKeun Yoo
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics. Authors: Kazutaka Kamiya, MD, PhD, Ik Hee Ryu, MD, MS, Tae Keun Yoo, MD, Jung Sub Kim MD, In Sik Lee, MD, PhD, Jin Kook Kim MD, Wakako Ando CO, Nobuyuki Shoji, MD, PhD, Tomofusa, Yamauchi, MD, PhD, Hitoshi Tabuchi, MD, PhD.

    We hypothesize that machine learning of preoperative biometric data obtained by the As-OCT may be clinically beneficial for predicting the actual ICL vault. Therefore, we built the machine learning model using Random Forest to predict ICL vault after surgery.

    This multicenter study comprised one thousand seven hundred forty-five eyes of 1745 consecutive patients (656 men and 1089 women), who underwent EVO ICL implantation (V4c and V5 Visian ICL with KS-AquaPORT) for the correction of moderate to high myopia and myopic astigmatism, and who completed at least a 1-month follow-up, at Kitasato University Hospital (Kanagawa, Japan), or at B&VIIT Eye Center (Seoul, Korea).

    This data file (RFR_model(feature=12).mat) is the final trained random forest model for MATLAB 2020a.

    Python version:

    from sklearn.model_selection import train_test_split import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestRegressor

    connect data in your google drive

    from google.colab import auth auth.authenticate_user() from google.colab import drive drive.mount('/content/gdrive')

    Change the path for the custom data

    In this case, we used ICL vault prediction using preop measurement

    dataset = pd.read_csv('gdrive/My Drive/ICL/data_icl.csv') dataset.head()

    optimal features (sorted by importance) :

    1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

    7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

    y = dataset['Vault_1M'] X = dataset.drop(['Vault_1M'], axis = 1)

    Split the dataset to train and test data, if necessary.

    For example, we can split data to 8:2 as a simple validation test

    train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0)

    In our study, we already defined the training (B&VIIT Eye Center, n=1455) and test (Kitasato University, n=290) dataset, this code was not necessary to perform our analysis.

    Optimal parameter search could be performed in this section

    parameters = {'bootstrap': True, 'min_samples_leaf': 3, 'n_estimators': 500, 'criterion': 'mae' 'min_samples_split': 10, 'max_features': 'sqrt', 'max_depth': 6, 'max_leaf_nodes': None}

    RF_model = RandomForestRegressor(**parameters) RF_model.fit(train_X, train_y) RF_predictions = RF_model.predict(test_X) importance = RF_model.feature_importances_

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christina Christodoulou (2024). hate_speech_dataset [Dataset]. https://huggingface.co/datasets/christinacdl/hate_speech_dataset

hate_speech_dataset

christinacdl/hate_speech_dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2024
Authors
Christina Christodoulou
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

32.579 texts in total, 14.012 NOT hateful texts and 18.567 HATEFUL texts All duplicate values were removed Split using sklearn into 80% train and 20% temporary test (stratified label). Then split the test set using 0.50% test and validation (stratified label) Split: 80/10/10 Train set label distribution: 0 ==> 11.210, 1 ==> 14.853, 26.063 in total Validation set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in total Test set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in… See the full description on the dataset page: https://huggingface.co/datasets/christinacdl/hate_speech_dataset.

Search
Clear search
Close search
Google apps
Main menu