50 datasets found

P
ASTD Dataset
paperswithcode.com
Updated Feb 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahmoud Nabil; Mohamed Aly; Amir Atiya (2021). ASTD Dataset [Dataset]. https://paperswithcode.com/dataset/astd
Explore at:
Dataset updated
Feb 20, 2021
Authors
Mahmoud Nabil; Mohamed Aly; Amir Atiya
Description
Arabic Sentiment Tweets Dataset (ASTD) is an Arabic social sentiment analysis dataset gathered from Twitter. It consists of about 10,000 tweets which are classified as objective, subjective positive, subjective negative, and subjective mixed.
Arabic Companies Reviews For Sentiment Analysis
kaggle.com
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mohamed ali salama (2023). Arabic Companies Reviews For Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/mohamedalisalama/arabic-companies-reviews-for-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mohamed ali salama
Description
Context

The data has 67K+ reviews in Arabic for sentiment analysis Data collecting using web scraping for many companies Like ( talabat,kabiter,nasla,swifil,alsiwidiu,kilubatra,dumati,.........etc)

Content

Coulnms

Reviews : review description rating : 1 postive , 0 neutral , -1 negative Company : continues company name for each review
Data from: Arabic news credibility on Twitter using sentiment analysis and...
zenodo.org
data.niaid.nih.gov
csv, txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duha Samdani; Duha Samdani; Mounira Taileb; Nada Almani; Mounira Taileb; Nada Almani (2023). Arabic news credibility on Twitter using sentiment analysis and ensemble learning [Dataset]. http://doi.org/10.5281/zenodo.8000717
Explore at:
csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8000717
Dataset updated
Jun 3, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Duha Samdani; Duha Samdani; Mounira Taileb; Nada Almani; Mounira Taileb; Nada Almani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Arabic news credibility on Twitter using sentiment analysis and ensemble learning.

WHAT IS IT?

-----------

an Arabic news credibility model on Twitter using sentiment analysis and ensemble learning.

Here we include the Collected dataset and the source code of the proposed model written in Python language and using Keras library with Tensorflow backend.

Required Packages

------------------

Keras (https://keras.io/).

Scikit-learn (http://scikit-learn.org/)

Imnlearn (imbalanced-learn documentation — Version 0.10.1)

To Run the model

---------------

One data file is required to run the model which are:

The data that were used are the collected dataset in the file, set the path of the required data file in the code.

The dataset

---------------

There are the dataset file with all features, you can choose the features that you need and apply it on the model.

There are a description file that describe each feature in the news credibility dataset

The file Tweet_ID contains the list of tweets id in the dataset.

The annotated replies based on credibility is provided.

CONTACTS

--------

If you want to report bugs or have general queries email to
Towards Generalization of Machine Learning Models: An Arabic Sentiment...
zenodo.org
data.niaid.nih.gov
csv
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samir Abdaljalil; Shaimaa Hassanein; Hamdy Mubarak; Ahmed Abdelali; Samir Abdaljalil; Shaimaa Hassanein; Hamdy Mubarak; Ahmed Abdelali (2023). Towards Generalization of Machine Learning Models: An Arabic Sentiment Analysis Dataset [Dataset]. http://doi.org/10.5281/zenodo.7801450
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7801450
Dataset updated
Jun 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samir Abdaljalil; Shaimaa Hassanein; Hamdy Mubarak; Ahmed Abdelali; Samir Abdaljalil; Shaimaa Hassanein; Hamdy Mubarak; Ahmed Abdelali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set consists of approximately 1.64 Million Arabic tweets (shared by their IDs) posted from 2009 to 2020, and their corresponding sentiment using a three-point classification system of Positive, Negative and Neutral/Mixed. No specific locations and/or keywords were specified throughout the data collection to obtain variation in the dialects and topics represented within the dataset. It is important to note that any biases in the proposed dataset in relation to the dialects and/or topics discussed were unintentional.

Please use the following citation if you use this data in a paper:

Abdaljalil, S., Hassanein, S., Mubarak, H., & Abdelali, A. (2023). Towards Generalization of Machine Learning Models: A Case Study of Arabic Sentiment Analysis. Proceedings of the International AAAI Conference on Web and Social Media, 17(1), 971-980.

Social Media Posts in Arabic Dialect

kaggle.com

Updated Jul 11, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

UM6P Open Data (2024). Social Media Posts in Arabic Dialect [Dataset]. https://www.kaggle.com/datasets/um6popendata/sentiment-analysis-for-sm-posts-in-arabic-dialect

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 11, 2024

Dataset provided by

Kaggle

Authors

UM6P Open Data

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset: Sentiment Analysis for Social Media Posts in Arabic Dialect

Overview

This dataset contains a labeled collection of approximately 50,000 social media posts in various Arabic dialects. Each post has been manually annotated with sentiment labels, providing a rich resource for natural language processing and sentiment analysis research.

Dataset Owner

UM6P College of Computing

Content

Posts: The dataset includes raw text data of social media posts written in different Arabic dialects.
Sentiment Labels: Each post is labeled with one of the following sentiment categories:
- Positive
- Negative
- Neutral

Features

Post ID: A unique identifier for each social media post.
Text: The content of the social media post in Arabic.
Sentiment: The sentiment label assigned to the post (Positive, Negative, Neutral).

Format

The dataset is provided in a CSV format with the following columns: - Post_ID: Integer - Text: String - Sentiment: String (Positive, Negative, Neutral)

Usage

This dataset is ideal for tasks such as: - Training sentiment analysis models - Studying sentiment trends in Arabic social media - Exploring the linguistic characteristics of Arabic dialects - Benchmarking sentiment analysis tools

Example Data

Post_ID	Text	Sentiment
1	"هذا المنتج رائع جدًا وأحببته كثيرًا"	Positive
2	"لم يعجبني هذا الفيلم، كان مملًا جدًا"	Negative
3	"الطقس اليوم عادي، لا يوجد شيء مميز"	Neutral

Licensing

Please refer to the dataset license included in the dataset files for information on usage rights and restrictions.

Citation

An open access NLP dataset for Arabic dialects: data collection, labeling, and model construction, Elmehdi Boujou, Hamza Chataoui, Abdellah El Mekki, Saad Benjelloun, Ikram Chairi and Ismail Berrada MENACIS 2020 conference, In press.

Arabic Sentiment Analysis Dataset SS2030 Dataset
kaggle.com
Updated May 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Alyami (2019). Arabic Sentiment Analysis Dataset SS2030 Dataset [Dataset]. https://www.kaggle.com/snalyami3/arabic-sentiment-analysis-dataset-ss2030-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sarah Alyami
Description
Please cite: Alyami, S. N., & Olatunji, S. O. (2020). Application of Support Vector Machine for Arabic Sentiment Classication Using Twitter-Based Dataset, 19(1), 1–13. https://doi.org/10.1142/S0219649220400183
E
HENSOLDT ANALYTICS Sentiment Analysis for Arabic
live.european-language-grid.eu
Updated Dec 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hensoldt Analytics (2021). HENSOLDT ANALYTICS Sentiment Analysis for Arabic [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/9464
Explore at:
Dataset updated
Dec 20, 2021
Dataset authored and provided by
Hensoldt Analytics
License
https://gitlab.com/european-language-grid/sail/sail-documents/blob/master/HENSOLDT-ANALYTICS_ELG_LICENSE.mdhttps://gitlab.com/european-language-grid/sail/sail-documents/blob/master/HENSOLDT-ANALYTICS_ELG_LICENSE.md
Description
HENSOLDT ANALYTICS MediaMiningIndexer SED - sentiment detection/analysis engine that provides attitude of paragraphs of text that can be positive, negative or netural.
Arabic Datasets for research purposes
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abu Bakr Soliman; Abu Bakr Soliman (2020). Arabic Datasets for research purposes [Dataset]. http://doi.org/10.5281/zenodo.1034601
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1034601
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abu Bakr Soliman; Abu Bakr Soliman
Description
Arabic Datasets for research purposes
H
Arabic Depression Tweets Dataset (15,000 Tweets) with Linguistic...
dataverse.harvard.edu
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelmoniem Helmy (2025). Arabic Depression Tweets Dataset (15,000 Tweets) with Linguistic Augmentation [Dataset]. http://doi.org/10.7910/DVN/UWLHRI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UWLHRI
Dataset updated
Jun 12, 2025
Dataset provided by
Harvard Dataverse
Authors
Abdelmoniem Helmy
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains 15,000 Arabic tweets annotated for depression detection and includes linguistic feature augmentations to support research in natural language processing (NLP), sentiment analysis, and mental health detection. The dataset was curated to enable studies on automatic depression detection in Arabic social media and to support machine learning and deep learning approaches in the domain of computational mental health. Contents The dataset consists of the following columns: tweet: The original Arabic tweet text. label: Binary label indicating whether the tweet expresses signs of depression: 1 = Depression 0 = Non-depression negation_flag: Indicates presence (1) or absence (0) of negation in the tweet. intensifier_flag: Indicates presence (1) or absence (0) of intensifiers (words that strengthen the degree of emotion). Class (redundant but included for convenience): Textual label corresponding to the binary label (Depression or Non-depression). Binary Classification: Contains the count of instances in each class (appears as an artifact in the provided file). Key Features Language: Arabic (varied dialects and Modern Standard Arabic). Source: Publicly available tweets collected from Twitter (X). Annotation: Manual labeling by native Arabic speakers trained in psychology and linguistics. Linguistic augmentation: Flags for negation and intensifier usage are included to support linguistically informed NLP models. Potential Use Cases Depression detection models for Arabic texts. Linguistic analysis of depression expression in Arabic social media. Cross-lingual studies comparing depression signals across languages. Development of clinical decision support systems leveraging social media data. Licensing & Ethical Considerations The dataset consists of public social media posts. Researchers are advised to use it strictly for research purposes, respecting privacy and ethical guidelines. No personally identifiable information (PII) is included. Citation If you use this dataset, please cite it appropriately in your research publications and acknowledge the creators.
h
tweet_sentiment_multilingual
huggingface.co
opendatalab.com
Updated Dec 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cardiff NLP (2022). tweet_sentiment_multilingual [Dataset]. https://huggingface.co/datasets/cardiffnlp/tweet_sentiment_multilingual
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 25, 2022
Dataset authored and provided by
Cardiff NLP
Description
Dataset Card for cardiffnlp/tweet_sentiment_multilingual

Dataset Summary

Tweet Sentiment Multilingual consists of sentiment analysis dataset on Twitter in 8 different lagnuages.

arabic english french german hindi italian portuguese spanish

Supported Tasks and Leaderboards

text_classification: The dataset can be trained using a SentenceClassification model from HuggingFace transformers.

Dataset Structure Data Instances

An instance from… See the full description on the dataset page: https://huggingface.co/datasets/cardiffnlp/tweet_sentiment_multilingual.
d
Direct Arabic products' opinions data set for opinion mining and sentiment...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
saad, sarah (2023). Direct Arabic products' opinions data set for opinion mining and sentiment analysis\" [Dataset]. http://doi.org/10.7910/DVN/YTSWJ4
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/YTSWJ4
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
saad, sarah
Description
The products' opinions in Arabsentiment dataset is collected manually from different social products' resources for opinion mining, feature extraction and sentiment analysis tasks. The collected opinions included different types of direct opinions that include at least one product feature whether it stated explicitly or in implicit manner. The dataset contains twenty different products categories like home, baby, different types of software products and other product types. Additionally, the products’ features are identified manually from the customer opinions and the product description. The products are classified according to each product type and there is a specific search query related to each type. For each product, the product name and brief description about the product capabilities are registered in products information file and classified to specific product types with a specific initial query for each type. The collected data contains opinions about twenty different products' categories. These opinions are selected based on the text size and the number of features that appear in the opinionated text. For each opinion, we keep track of the opinionated text and the sentiment rating score entered by the customers. The rating score represent the overall polarity of the reviewer towards the products into one of two categories: positive or negative sentiment. The main dataset attributes involve the total number of directed opinions used in dataset that should include at least one explicit product features, the number of opinions with positive sentiment score is 1459 and negative sentiment polarity score is 516.
Arabic 100k Reviews
kaggle.com
Updated Mar 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abed Khooli (2020). Arabic 100k Reviews [Dataset]. https://www.kaggle.com/abedkhooli/arabic-100k-reviews/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abed Khooli
Description
Context

Few Arabic datasets are available for classification comparison and other NLP tasks. This dataset is mainly a compilation of several available datasets and a sampling of 100k rows (99999 to be exact).

Content

The dataset combines reviews from hotels, books, movies, products and a few airlines. It has three classes (Mixed, Negative and Positive). Most were mapped from reviewers' ratings with 3 being mixed, above 3 positive and below 3 negative. Each row has a label and text separated by a tab (tsv). Text (reviews) were cleaned by removing Arabic diacritics and non-Arabic characters. The dataset has no duplicate reviews.

Acknowledgements

The hotels and book reviews are a subset of HARD and BRAD. The rest were selected from hadyelsahar with a little over 100 airlines reviews collected manually.

Inspiration

Let's jump in and use your best tools to beat the SOTA! Don't forget to show and share your work.
h
Arabic-Sentiments
huggingface.co
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Imran Zaman (2025). Arabic-Sentiments [Dataset]. https://huggingface.co/datasets/ImranzamanML/Arabic-Sentiments
Explore at:
Dataset updated
Apr 27, 2025
Authors
Muhammad Imran Zaman
Description
Arabic Sentiment Analysis Dataset

Dataset Description This dataset contains Arabic text snippets, each labeled with a sentiment polarity (positive or negative). The data appears to be intended for tasks like sentiment analysis or text classification. It is divided into separate training and testing files (train.tsv and test.tsv).

Source Files

train.tsv
test.tsv

Language

Arabic

Data Format

Tab-Separated Values (.tsv)
Each line consists of two fields separated… See the full description on the dataset page: https://huggingface.co/datasets/ImranzamanML/Arabic-Sentiments.
Datasets for sentiment analysis of arabizi tweets
kaggle.com
Updated Jun 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria JM Raidy (2020). Datasets for sentiment analysis of arabizi tweets [Dataset]. https://www.kaggle.com/mariajmraidy/datasets-for-sentiment-analysis-of-arabizi/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Maria JM Raidy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Arabizi is a modern variant of the Arabic language that is being increasingly used by millennials. In fact, Arabizi is Arabic expressed using text that is transliterated to Latin characters while numbers are used to represent characters and sounds that do not exist in Latin-character languages. The proposed datasets are labelled for sentiment analysis of lebanese arabizi twitter data.

Content

Tweets have been collected randomly between 2017 and 2020. They all have geoTagging option turned on and in Lebanon.

Columns: Text, sentiment, highlight

They have been annotated with a minimum of 2-agreement: - unbalanced-sentiment-arabizi-ds.csv contains all the labelled tweets with a minimum of 2-agreement. - 2-class-sentiment-arabizi-ds.csv are labelled as positive or negative. - 3-class-sentiment-arabizi-ds.csv are labelled as positive, negative or neutral.

Both datasets have a third column called highlight: an informative column filled when the highlight is obvious. Options are: - Sectarianism: Prejudice, discrimination, or hatred arising from attaching relations of inferiority and superiority to differences between subdivisions within a group. - Sexism: Prejudice, stereotyping, or discrimination, typically against women, based on sex. - Racism: Prejudice, discrimination, or antagonism directed against someone of a different race based on the belief that one's own race is superior. - Foul language: Coarse or offensive language: swearing, bad words, obscene words, dirty words, … - Bullying: Seek to harm, intimidate, or coerce - Sarcasm: The use of irony to mock or convey contempt. - Joke: A thing that someone says to cause amusement or laughter, especially a story with a funny punchline. - Courtesy words: A polite remark or respectful act: ‘thank you’, ‘please’, ‘excuse me’, … - Saying: Any concisely written or spoken expression that is especially memorable because of its meaning or style. A quotation from a text or speech. - Known fact: Something that is generally recognized as a fact or truth: that grass is green

Inspiration

I would like to see Sentiment analysis models tested or validated on the datasets.

Best results to date 24-05-2020

2-class-sentiment-arabizi-ds.csv - model: Decision trees - Accuracy 81% - Precision 81% - Recall 81% - F1 81%

3-class-sentiment-arabizi-ds.csv - model: Logistic regression - Accuracy 65% - Precision 65% - Recall 65% - F1 65%
s
Arabic Text Dataset
shaip.com
tl.shaip.com
+1more
json
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2024). Arabic Text Dataset [Dataset]. https://www.shaip.com/offerings/language-text-datasets/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 26, 2024
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Arabic Text Dataset contains a collection of text samples written in Arabic. It includes various forms of content, such as news articles, social media posts, literature, and dialogue, spanning different topics and writing styles. This dataset is used for tasks such as natural language processing (NLP), text classification, sentiment analysis, and machine translation in Arabic language applications.
P
ArSen Dataset
paperswithcode.com
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Fang; Cheng Xu; Shuhao Guan; Nan Yan; Yuke Mei (2024). ArSen Dataset [Dataset]. https://paperswithcode.com/dataset/arsen
Explore at:
Dataset updated
Nov 14, 2024
Authors
Yang Fang; Cheng Xu; Shuhao Guan; Nan Yan; Yuke Mei
Description
Sentiment analysis is pivotal in Natural Language Processing for understanding opinions and emotions in text. While advancements in Sentiment analysis for English are notable, Arabic Sentiment Analysis (ASA) lags, despite the growing Arabic online user base. Existing ASA benchmarks are often outdated and lack comprehensive evaluation capabilities for state-of-the-art models. To bridge this gap, we introduce ArSen, a meticulously annotated COVID-19-themed Arabic dataset, and the IFDHN, a novel model incorporating fuzzy logic for enhanced sentiment classification. ArSen provides a contemporary, robust benchmark, and IFDHN achieves state-of-the-art performance on ASA tasks. Comprehensive evaluations demonstrate the efficacy of IFDHN using the ArSen dataset, highlighting future research directions in ASA.
m
Moroccan Darija Offensive Language Detection Dataset
data.mendeley.com
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anass Ibrahimi (2023). Moroccan Darija Offensive Language Detection Dataset [Dataset]. http://doi.org/10.17632/2y4m97b7dc.1
Explore at:
Unique identifier
https://doi.org/10.17632/2y4m97b7dc.1
Dataset updated
Sep 20, 2023
Authors
Anass Ibrahimi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Morocco
Description
The Moroccan Darija offensive language detection dataset is a human-labeled dataset consisting of a set of Moroccan Darija sentences for offensive language detection. The dataset contains 20,402 sentences and their corresponding binary labels: 0 for a non-offensive sentence and 1 for an offensive sentence. The sentences were gathered from Twitter and YouTube comments and are written in both Latin and Arabic scripts. Inoffensive sentences account for 62.2% (12,685 sentences), while offensive sentences account for 37.8% (7,717 sentences). This contribution addresses the scarcity of labeled datasets for Moroccan Darija and provides a resource for natural language processing researchers interested in Moroccan Darija, particularly offensive language and sentiment analysis tasks.
m
Data from: KurdiSent: A Corpus For Kurdish Sentiment Analysis
data.mendeley.com
Updated Feb 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soran Badawi (2023). KurdiSent: A Corpus For Kurdish Sentiment Analysis [Dataset]. http://doi.org/10.17632/3yrkswy6ph.2
Explore at:
Unique identifier
https://doi.org/10.17632/3yrkswy6ph.2
Dataset updated
Feb 6, 2023
Authors
Soran Badawi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Kurdish language is regarded as one of the less-resourced languages. The language is globally practised by 30-40 people. The language has 33 letters that are largely similar to the Arabic language. The Kurdish language has two major dialects Sorani and Badini. The dataset includes a collection of texts written in the Sorani dialect. It contains tweets the Twitter API. Due to security reasons and following the policies of Twitter, we removed the user's identity. We collected the tweets which was published during the time of the Corona Virus pandemic. The tweets are raw texts, and the content covers a varied range of topics, starting from politics, sports, entertainment, social life, etc. Data Labeling We used the Twitter developer (Twitter API) to mine the tweets. The dataset was annotated manually by three Kurdish native speakers. The annotators were required to identify the classes and categories of each text. The classes included positive, negative and neutral and the categories consisted of news, technology, art, social and health. The texts which were agreed upon by at least two annotators to possess a specific label and category were regarded as conflict-free and accepted for further processing. Other texts that caused conflict among all three raters were ignored and have been removed from the dataset. The doccano program was used to help the annotators label each text one by one.
m
Data from: Sentiment Analysis of Multilingual Dataset of Bahraini Dialects,...
data.mendeley.com
Updated Feb 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thuraya Omran (2023). Sentiment Analysis of Multilingual Dataset of Bahraini Dialects, Arabic, and English [Dataset]. http://doi.org/10.17632/5rhw2srzjj.1
Explore at:
Unique identifier
https://doi.org/10.17632/5rhw2srzjj.1
Dataset updated
Feb 15, 2023
Authors
Thuraya Omran
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Area covered
Bahrain
Description
This dataset was generated using two cascading stages of translation—a machine translation followed by a manual one. Machine translation was applied using Google translate to translate English Amazon product reviews to Standard Arabic. In contrast, the manual approach was applied to translate the resulting Arabic reviews to Bahraini ones by qualified native speakers utilizing constructed customized forms. The resulting parallel dataset of English, Standard Arabic, and Bahraini dialects is called English_Modern Standard Arabic_Bahraini Dialects product reviews for sentiment analysis “E_MSA_BDs-PR-SA”. The dataset is balanced, composed of 2,500 positive and 2,500 negative reviews.
Z
Sentiment dataset of Algerian dialect
data.niaid.nih.gov
Updated Apr 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mazari, Ahmed cherif (2024). Sentiment dataset of Algerian dialect [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10937411
Explore at:
Dataset updated
Apr 7, 2024
Dataset authored and provided by
Mazari, Ahmed cherif
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Algeria
Description
This sentiment dataset of Algerian dialect consists of 11760 comments (6111 positive/ 5649 negative comments)) collected from (Facebook, YouTube and Twitter) during Hirak 2019.* Comments concern the Algerian spoken language, written in Arabic and/or Latin characters and/or Arabizi, which could be either Modern Standard Arabic, French or local dialect.* Value ‘1’ is attributed for Positive review / value ‘0’ attributed for Negative review.* Due to the nature of this Dataset, some comments contain offensive language. This does not reflect author values, however the aim is to providing a resource to help in analysing positive and negative sentiments (that probably containing harmful content).* For more information please contact (@Ahmed Cherif Mazari) : mazari.ac@gmail.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Mahmoud Nabil; Mohamed Aly; Amir Atiya (2021). ASTD Dataset [Dataset]. https://paperswithcode.com/dataset/astd

ASTD Dataset

Arabic Sentiment Tweets Dataset

Explore at:

Dataset updated

Feb 20, 2021

Authors

Mahmoud Nabil; Mohamed Aly; Amir Atiya

Description

Arabic Sentiment Tweets Dataset (ASTD) is an Arabic social sentiment analysis dataset gathered from Twitter. It consists of about 10,000 tweets which are classified as objective, subjective positive, subjective negative, and subjective mixed.

Clear search

Close search

Google apps

Main menu

ASTD Dataset

Arabic Companies Reviews For Sentiment Analysis

Context

Content

Data from: Arabic news credibility on Twitter using sentiment analysis and...

Towards Generalization of Machine Learning Models: An Arabic Sentiment...

Social Media Posts in Arabic Dialect

Dataset: Sentiment Analysis for Social Media Posts in Arabic Dialect

Overview

Dataset Owner

Content

Features

Format

Usage

Example Data

Licensing

Citation

Arabic Sentiment Analysis Dataset SS2030 Dataset

HENSOLDT ANALYTICS Sentiment Analysis for Arabic

Arabic Datasets for research purposes

Arabic Depression Tweets Dataset (15,000 Tweets) with Linguistic...

tweet_sentiment_multilingual

Direct Arabic products' opinions data set for opinion mining and sentiment...

Arabic 100k Reviews

Context

Content

Acknowledgements

Inspiration

Arabic-Sentiments

Datasets for sentiment analysis of arabizi tweets

Context

Content

Inspiration

Best results to date 24-05-2020

Arabic Text Dataset

ArSen Dataset

Moroccan Darija Offensive Language Detection Dataset

Data from: KurdiSent: A Corpus For Kurdish Sentiment Analysis

Data from: Sentiment Analysis of Multilingual Dataset of Bahraini Dialects,...

Sentiment dataset of Algerian dialect

ASTD DatasetSee More Versions

Arabic Sentiment Tweets Dataset

ASTD Dataset