2 datasets found

h
hate_speech_dataset
huggingface.co
Updated Jul 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Christodoulou (2024). hate_speech_dataset [Dataset]. https://huggingface.co/datasets/christinacdl/hate_speech_dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2024
Authors
Christina Christodoulou
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
32.579 texts in total, 14.012 NOT hateful texts and 18.567 HATEFUL texts All duplicate values were removed Split using sklearn into 80% train and 20% temporary test (stratified label). Then split the test set using 0.50% test and validation (stratified label) Split: 80/10/10 Train set label distribution: 0 ==> 11.210, 1 ==> 14.853, 26.063 in total Validation set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in total Test set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in… See the full description on the dataset page: https://huggingface.co/datasets/christinacdl/hate_speech_dataset.
m
Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...
data.mendeley.com
narcis.nl
Updated Jan 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TaeKeun Yoo (2021). Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics" [Dataset]. http://doi.org/10.17632/ffn745r57z.2
Explore at:
Unique identifier
https://doi.org/10.17632/ffn745r57z.2
Dataset updated
Jan 11, 2021
Authors
TaeKeun Yoo
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics. Authors: Kazutaka Kamiya, MD, PhD, Ik Hee Ryu, MD, MS, Tae Keun Yoo, MD, Jung Sub Kim MD, In Sik Lee, MD, PhD, Jin Kook Kim MD, Wakako Ando CO, Nobuyuki Shoji, MD, PhD, Tomofusa, Yamauchi, MD, PhD, Hitoshi Tabuchi, MD, PhD.

We hypothesize that machine learning of preoperative biometric data obtained by the As-OCT may be clinically beneficial for predicting the actual ICL vault. Therefore, we built the machine learning model using Random Forest to predict ICL vault after surgery.

This multicenter study comprised one thousand seven hundred forty-five eyes of 1745 consecutive patients (656 men and 1089 women), who underwent EVO ICL implantation (V4c and V5 Visian ICL with KS-AquaPORT) for the correction of moderate to high myopia and myopic astigmatism, and who completed at least a 1-month follow-up, at Kitasato University Hospital (Kanagawa, Japan), or at B&VIIT Eye Center (Seoul, Korea).

This data file (RFR_model(feature=12).mat) is the final trained random forest model for MATLAB 2020a.

Python version:

from sklearn.model_selection import train_test_split import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestRegressor

connect data in your google drive

from google.colab import auth auth.authenticate_user() from google.colab import drive drive.mount('/content/gdrive')

Change the path for the custom data

In this case, we used ICL vault prediction using preop measurement

dataset = pd.read_csv('gdrive/My Drive/ICL/data_icl.csv') dataset.head()

optimal features (sorted by importance) :

1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

y = dataset['Vault_1M'] X = dataset.drop(['Vault_1M'], axis = 1)

Split the dataset to train and test data, if necessary.

For example, we can split data to 8:2 as a simple validation test

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0)

In our study, we already defined the training (B&VIIT Eye Center, n=1455) and test (Kitasato University, n=290) dataset, this code was not necessary to perform our analysis.

Optimal parameter search could be performed in this section

parameters = {'bootstrap': True, 'min_samples_leaf': 3, 'n_estimators': 500, 'criterion': 'mae' 'min_samples_split': 10, 'max_features': 'sqrt', 'max_depth': 6, 'max_leaf_nodes': None}

RF_model = RandomForestRegressor(**parameters) RF_model.fit(train_X, train_y) RF_predictions = RF_model.predict(test_X) importance = RF_model.feature_importances_
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christina Christodoulou (2024). hate_speech_dataset [Dataset]. https://huggingface.co/datasets/christinacdl/hate_speech_dataset

hate_speech_dataset

christinacdl/hate_speech_dataset

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 27, 2024

Authors

Christina Christodoulou

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

32.579 texts in total, 14.012 NOT hateful texts and 18.567 HATEFUL texts All duplicate values were removed Split using sklearn into 80% train and 20% temporary test (stratified label). Then split the test set using 0.50% test and validation (stratified label) Split: 80/10/10 Train set label distribution: 0 ==> 11.210, 1 ==> 14.853, 26.063 in total Validation set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in total Test set label distribution: 0 ==> 1.401, 1 ==> 1.857, 3.258 in… See the full description on the dataset page: https://huggingface.co/datasets/christinacdl/hate_speech_dataset.

Clear search

Close search

Google apps

Main menu

hate_speech_dataset

Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...

connect data in your google drive

Change the path for the custom data

In this case, we used ICL vault prediction using preop measurement

optimal features (sorted by importance) :

1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

Split the dataset to train and test data, if necessary.

For example, we can split data to 8:2 as a simple validation test

In our study, we already defined the training (B&VIIT Eye Center, n=1455) and test (Kitasato University, n=290) dataset, this code was not necessary to perform our analysis.

Optimal parameter search could be performed in this section

hate_speech_datasetSee More Versions

christinacdl/hate_speech_dataset

hate_speech_dataset