1 dataset found
  1. m

    Data from: KurdiSent: A Corpus For Kurdish Sentiment Analysis

    • data.mendeley.com
    Updated Feb 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soran Badawi (2023). KurdiSent: A Corpus For Kurdish Sentiment Analysis [Dataset]. http://doi.org/10.17632/3yrkswy6ph.2
    Explore at:
    Dataset updated
    Feb 6, 2023
    Authors
    Soran Badawi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Kurdish language is regarded as one of the less-resourced languages. The language is globally practised by 30-40 people. The language has 33 letters that are largely similar to the Arabic language. The Kurdish language has two major dialects Sorani and Badini. The dataset includes a collection of texts written in the Sorani dialect. It contains tweets the Twitter API. Due to security reasons and following the policies of Twitter, we removed the user's identity. We collected the tweets which was published during the time of the Corona Virus pandemic. The tweets are raw texts, and the content covers a varied range of topics, starting from politics, sports, entertainment, social life, etc. Data Labeling We used the Twitter developer (Twitter API) to mine the tweets. The dataset was annotated manually by three Kurdish native speakers. The annotators were required to identify the classes and categories of each text. The classes included positive, negative and neutral and the categories consisted of news, technology, art, social and health. The texts which were agreed upon by at least two annotators to possess a specific label and category were regarded as conflict-free and accepted for further processing. Other texts that caused conflict among all three raters were ignored and have been removed from the dataset. The doccano program was used to help the annotators label each text one by one.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Soran Badawi (2023). KurdiSent: A Corpus For Kurdish Sentiment Analysis [Dataset]. http://doi.org/10.17632/3yrkswy6ph.2

Data from: KurdiSent: A Corpus For Kurdish Sentiment Analysis

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 6, 2023
Authors
Soran Badawi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Kurdish language is regarded as one of the less-resourced languages. The language is globally practised by 30-40 people. The language has 33 letters that are largely similar to the Arabic language. The Kurdish language has two major dialects Sorani and Badini. The dataset includes a collection of texts written in the Sorani dialect. It contains tweets the Twitter API. Due to security reasons and following the policies of Twitter, we removed the user's identity. We collected the tweets which was published during the time of the Corona Virus pandemic. The tweets are raw texts, and the content covers a varied range of topics, starting from politics, sports, entertainment, social life, etc. Data Labeling We used the Twitter developer (Twitter API) to mine the tweets. The dataset was annotated manually by three Kurdish native speakers. The annotators were required to identify the classes and categories of each text. The classes included positive, negative and neutral and the categories consisted of news, technology, art, social and health. The texts which were agreed upon by at least two annotators to possess a specific label and category were regarded as conflict-free and accepted for further processing. Other texts that caused conflict among all three raters were ignored and have been removed from the dataset. The doccano program was used to help the annotators label each text one by one.

Search
Clear search
Close search
Google apps
Main menu