100+ datasets found
  1. m

    Motamot: A Dataset for Revealing the Supremacy of Large Language Models over...

    • data.mendeley.com
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatema Tuj Johora Faria (2024). Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis [Dataset]. http://doi.org/10.17632/hdhnrrwdz2.1
    Explore at:
    Dataset updated
    May 13, 2024
    Authors
    Fatema Tuj Johora Faria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset "Motamot" containing 7,058 data points labeled with Positive and Negative sentiments, tailored specifically for Political Sentiment Analysis in the Bengali language. The dataset comprises 4,132 instances labeled as Positive and 2,926 instances labeled as Negative sentiments.

    Specifics of the Core Data: —------------------------------- Train 5647, Test 706, Validation 705

    Train : —-------------------------------

    Positive: 3306

    Negative: 2341

    Test : —-------------------------------

    Positive: 413

    Negative: 293

    Validation : —-------------------------------

    Positive: 413

    Negative: 292

  2. Political Tweets Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Political Tweets Dataset [Dataset]. https://brightdata.com/products/datasets/twitter/tweets/political
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Political Tweets dataset to enhance campaign strategies and gain insights into public discourse. This dataset offers a comprehensive view of political dynamics on social media, empowering organizations, researchers, and policymakers to analyze trends and sentiment. Access the full dataset or customize it with specific data points tailored to your needs. Popular use cases include: Sentiment Analysis: Analyze publicly available political tweets to understand public sentiment on policies, events, and candidates, aiding campaign strategies and opinion research. Trend Monitoring: Track trending topics and hashtags in political discourse to identify key issues and shifts in public priorities across demographics. Misinformation Detection: Detect and analyze patterns of misinformation, supporting efforts to combat its spread effectively. Harness these insights to stay informed and adapt to the evolving political landscape.

  3. f

    Appendix 5.1 Sentiment analysis on WeChat posts

    • figshare.com
    zip
    Updated Jul 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Titus Chen (2021). Appendix 5.1 Sentiment analysis on WeChat posts [Dataset]. http://doi.org/10.6084/m9.figshare.12738164.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 17, 2021
    Dataset provided by
    figshare
    Authors
    Titus Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    id: unique id number of WeChat postAccount: WeChat public accounttime: date of post publicationtitle: title of WeChat postcontent: content of WeChat postmedia: number of multimedia objects in WeChat postURL: permanent link of WeChat postsegcon_unlist: tokenized content of WeChat postmonth: month of post publicationyear: year of post publicationAccount_e: English name of WeChat public accounttoken_count: number of tokens in WeChat postsenti_count: number of sentiment terms in WeChat postsenti_str: Sentiment terms in WeChat postNB: Weighted number of sadness terms (悲伤) in WeChat postNC: Weighted number of fear terms (恐惧) in WeChat postND: Weighted number of antipathy terms (憎恶) in WeChat postNE: Weighted number of distraught terms (烦闷) in WeChat postNG: Weighted number of shame/shyness terms (羞) in WeChat postNH: Weighted number of guilt terms (疚) in WeChat postNI: Weighted number of panic terms (慌) in WeChat postNJ: Weighted number of disappointment terms (失望) in WeChat postNK: Weighted number of jealousy terms (妒忌) in WeChat postNL: Weighted number of skepticism terms (怀疑) in WeChat postNN: Weighted number of denigration terms (贬责) in WeChat postPA: Weighted number of happiness terms (快乐) in WeChat postPB: Weighted number of fondness terms (喜爱) in WeChat postPC: Weighted number of surprise terms (惊奇) in WeChat postPD: Weighted number of respect terms in (尊敬) WeChat postPE: Weighted number of peacefulness terms (安心) in WeChat postPF: Weighted number of anxiety terms (思) in WeChat postPG: Weighted number of trust terms (相信) in WeChat postPH: Weighted number of commendation/approval terms (赞扬) in WeChat postPK: Weighted number of wishfulness terms (祝愿) in WeChat postNAs: Weighted number of anger terms (愤怒) in WeChat postposi: Weighted number of positive sentiment terms in WeChat postnega: Weighted number of negative sentiment terms in WeChat postall_senti: Weighted number of all sentiment terms in WeChat postall_senti_norm: Standardized weighted number of all sentiment terms in WeChat postofficial: Official affiliation of WeChat post (Official media or Non-official media)

  4. Twitter Trends|| PPP &PTI || Pakistan Elections

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aqeelkh (2024). Twitter Trends|| PPP &PTI || Pakistan Elections [Dataset]. https://www.kaggle.com/datasets/aqeelkh/twitter-trends-ppp-and-pti-pakistan-elections/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aqeelkh
    Area covered
    Pakistan
    Description

    Dataset Title: PPP and PTI Twitter Trend Analysis

    Overview This dataset encompasses a collection of 1184 tweets from the Twitter trend "PPP and PTI," capturing a snapshot of public discourse and sentiment regarding Pakistan's prominent political entities: the Pakistan Peoples Party (PPP) and Pakistan Tehreek-e-Insaf (PTI). It provides a diverse range of perspectives and reactions from Twitter users, making it an invaluable resource for political analysts, data scientists, and researchers interested in political sentiment analysis, social media analytics, and digital humanities.

    Dataset Description The dataset is structured into seven columns, each offering distinct insights into the tweets collected:

    • UserTag: The Twitter handle of the user who posted the tweet.
    • TimeStamp: The date and time when the tweet was posted, providing temporal context to the data.
    • Current_Date: The date when the data was collected, ensuring traceability and relevance. -**Tweet Body**: The actual content of the tweet, encapsulating the message, sentiment, and topics discussed by the user. This column is central to text analysis, sentiment detection, and thematic studies.
    • Reply: The number of replies to the tweet, indicating engagement and conversational depth.
    • Retweet: The number of retweets, reflecting the tweet's reach and virality within the Twitter community.
    • Likes: The number of likes, serving as a proxy for the tweet's popularity and user agreement.
    • Views: An estimate of how many times the tweet was viewed, offering insights into its impact and visibility.

    Potential Uses This dataset can serve a wide range of purposes, including but not limited to: 1. Sentiment analysis to gauge public opinion regarding PPP and PTI. 2. Temporal analysis to identify trends and shifts in public sentiment over time. 3. Network analysis to explore interactions and the spread of information among users. 4. Comparative analysis between the engagement and popularity of tweets related to PPP vs. PTI.

    Methodology The tweets were collected using Selenium WebDriver, ensuring a comprehensive and unbiased selection of tweets related to the "PPP and PTI" trend. Care was taken to include tweets from various times of the day to capture a broad spectrum of user engagement and opinions.

    Ethical Considerations All data was collected and presented in accordance with Twitter's data use policies and ethical guidelines for research.

    Acknowledgments This dataset was created by Aqeel Khan, a student of BS Mathematics at Namal University Mianwali, with a keen interest in data science and machine learning. The dataset compilation was aimed at facilitating research and analysis in the domains of political science, social media analytics, and data science.

    License This dataset is shared for educational and research purposes. Users of the dataset are encouraged to cite the source and adhere to Twitter's terms of service regarding the use of shared data.

  5. b

    Twitter Sentiment Analysis Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 24, 2024
    Dataset authored and provided by
    Bright Data
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

    Key Features:
    
      Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
      Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
      Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
      Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
      Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
      Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
    
    
    Use Cases:
    
      Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
      Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
      Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
      AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
      Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
    
    
    
      Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
      Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
    
  6. Appendix 6.3 Data of sentiment analysis on US-China trade disputes

    • figshare.com
    txt
    Updated Jul 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Titus Chen (2021). Appendix 6.3 Data of sentiment analysis on US-China trade disputes [Dataset]. http://doi.org/10.6084/m9.figshare.14601738.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 20, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Titus Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China, United States
    Description

    This folder deposits two datasets. The first dataset contains the sentiment data of the tariff-related WeChat posts. The other dataset keeps the sentiment data of the Huawei-related WeChat posts.

  7. E

    Data from: PolSentiLex: Sentiment Detection in Socio-Political Discussions...

    • live.european-language-grid.eu
    • explore.openaire.eu
    • +2more
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7654
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A Russian-language sentiment lexicon for social media discussions on political and social issues.

    The file contains raw markings collected with LINIS coding service https://linis-crowd.org [in Russian].

    Learn more about PolSentiLex in our papers:

    Koltsova, O., & Alexeeva, S. (2015). Linis-crowd.org: A lexical resource for Russian sentiment analysis of social media [Linis-crowd.org: Lexichesk resurs dl’a analiza tonal’nosti sotsial’no-politicheskix tekstov]. Computational Linguis- Tics and Computantional Ontologies: Proceedings of the XVIII Joint Conference “Internet and Modern Society (IMS-2015)” [Kompyuternaya Lingvistika i Vyichis- Litelnyie Ontologii: Sbornik Nauchnyih Statey. Trudyi XVIII Ob’edinennoy Konferen- Tsii «Internet i Sovremennoe Obschestvo» (IMS-2015)], 25–34. [in Russian] URL: https://scila.hse.ru/data/2020/06/02/1603986481/koltsovaoyuetal.pdf

    Koltsova, O., Alexeeva, S., & Koltsov, S. (2016). An Opinion Word Lexicon and a Training Dataset for Russian Sentiment Analysis of Social Media. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2016”, 277–287. URL: http://www.dialog-21.ru/media/3400/koltsovaoyuetal.pdf

    Koltsova O., Alexeeva S., Pashakhin S., Koltsov S. (2020) PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media. In: Filchenkov A., Kauttonen J., Pivovarova L. (eds) Artificial Intelligence and Natural Language. AINL 2020. Communications in Computer and Information Science, vol 1292. Springer, Cham. https://doi.org/10.1007/978-3-030-59082-6_1

  8. m

    Bangla Sentiment Dataset

    • data.mendeley.com
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jahanur Biswas (2025). Bangla Sentiment Dataset [Dataset]. http://doi.org/10.17632/rh67mckhbh.2
    Explore at:
    Dataset updated
    Jun 3, 2025
    Authors
    Jahanur Biswas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Bangla Sentiment Dataset is a curated collection of sentiment-rich textual data in Bangla, focused on recent and trending topics. This dataset has been compiled from diverse sources, including Bangladeshi online newspapers, social media platforms, and blogs, ensuring a wide spectrum of language styles and sentiment expressions.

    Key Features: Focus on Recent Topics: The dataset emphasizes contemporary issues, trending discussions, and popular topics in Bangladeshi society. This includes sentiments on political developments, social movements, entertainment, cultural events, and other recent happenings.

    Source Variety:

    Online Newspapers: Articles, editorials, headlines, and reader comments provide structured and semi-formal sentiment data. Social Media: Posts, tweets, and comments reflect informal, conversational language with high emotional expressiveness. Blogs: Opinion pieces and discussions offer detailed and context-rich sentiment content. Sentiment Labels: Each entry in the dataset is annotated with one of the following sentiment categories:

    Positive (1): Texts expressing happiness, agreement, or optimism. Negative (0): Texts reflecting criticism, disagreement, or pessimism. Neutral (2): Texts presenting balanced or factual statements with minimal emotional bias. Linguistic and Stylistic Diversity: The dataset captures a range of Bangla language variations, including:

    Formal and informal Bangla usage. Regional dialects. Transliterated Bangla (Banglish) commonly used on social media. Real-World Context: The inclusion of recent topics ensures that the dataset is relevant for analyzing public sentiment around current events and trends. This makes it particularly useful for real-time sentiment analysis applications.

    This dataset provides an invaluable resource for researchers and practitioners aiming to explore sentiment analysis in Bangla, with a special emphasis on modern-day relevance and real-world applicability.

  9. Data from: Migration Sentiment Analysis Dataset from Portuguese Political...

    • zenodo.org
    • investigacion.usc.gal
    bin
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik Bran Marino; Erik Bran Marino; Renata Vieira; Renata Vieira; Suso Baleato; Suso Baleato; Ana Sofia Ribeiro; Ana Sofia Ribeiro; Katarina Laken; Katarina Laken (2025). Migration Sentiment Analysis Dataset from Portuguese Political Manifestos (2011, 2015, 2019) [Dataset]. http://doi.org/10.5281/zenodo.15189809
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erik Bran Marino; Erik Bran Marino; Renata Vieira; Renata Vieira; Suso Baleato; Suso Baleato; Ana Sofia Ribeiro; Ana Sofia Ribeiro; Katarina Laken; Katarina Laken
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of migration-related sentences extracted from Portuguese political party manifestos from the 2011, 2015, and 2019 legislative elections. Each entry includes the original sentence in Portuguese, sentiment analysis scores (positive, negative, and neutral probabilities), and the migration-related term that appears in the sentence. The sentiment analysis was performed using a multilingual BERT model trained for sentiment classification.

    The dataset was created as part of a research project examining how political discourse around migration has evolved in Portugal's changing political landscape, particularly with the emergence of new parties. This resource supports computational analysis of political communication regarding migration issues in Portugal.

  10. m

    Tweets on Political and Social issues for analysis using Neutrosophic Sets

    • data.mendeley.com
    Updated Jan 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilanthenral Kandasamy (2019). Tweets on Political and Social issues for analysis using Neutrosophic Sets [Dataset]. http://doi.org/10.17632/fnzmfgy2bd.1
    Explore at:
    Dataset updated
    Jan 27, 2019
    Authors
    Ilanthenral Kandasamy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ten different topics were taken for analysis. 1000 tweets about each topic was collected for analysis. The file here as a sample case; contains the tweets extracted about Farm Loan along with polarity calculated by TextBlob (Python).

  11. A

    Training Data for German Sentiment Analysis of Political Communication (SUF...

    • data.aussda.at
    bin, pdf, tsv
    Updated Nov 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Haselmayer; Martin Haselmayer; Marcelo Jenny; Marcelo Jenny (2020). Training Data for German Sentiment Analysis of Political Communication (SUF edition) [Dataset]. http://doi.org/10.11587/EOPCOB
    Explore at:
    pdf(86728), tsv(448), tsv(23371430), bin(9134185)Available download formats
    Dataset updated
    Nov 26, 2020
    Dataset provided by
    AUSSDA
    Authors
    Martin Haselmayer; Martin Haselmayer; Marcelo Jenny; Marcelo Jenny
    License

    https://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/EOPCOBhttps://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/EOPCOB

    Area covered
    Austria
    Dataset funded by
    Austrian Science Fund
    Vienna Anniversary Foundation for Higher Education
    Description

    Full edition for scientific use. The dataset contains 125871 sentences extracted from Austrian parliamentary debates and party press releases. Press releases were collected under the auspices of the Austrian National Election Study (AUTNES) and cover 6 weeks prior to each national election 1995-2013. Data from parliamentary debates stem from a random sample of sentences drawn from sessions of the Austrian National Council (1995-2013). The sentiment of the sentences was crowdcoded on a five-point-scale ranging from 0 “Not negative” to 5 “Very strongly negative”. As each sentence has been coded by ten coders, there are multiple codingids for each unitid (sentence).

  12. Z

    SEN - Sentiment analysis of Entities in News headlines

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katarzyna Baraniak (2023). SEN - Sentiment analysis of Entities in News headlines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5211931
    Explore at:
    Dataset updated
    Oct 15, 2023
    Dataset provided by
    Katarzyna Baraniak
    Marcin Sydow
    Description

    If you wish to use this data please cite:

    Katarzyna Baraniak, Marcin Sydow, A dataset for Sentiment analysis of Entities in News headlines (SEN), Procedia Computer Science, Volume 192, 2021, Pages 3627-3636, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2021.09.136. (https://www.sciencedirect.com/science/article/pii/S1877050921018755)

    bibtex: users.pja.edu.pl/~msyd/bibtex/sydow-baraniak-SENdataset-kes21.bib

    SEN is a novel publicly available human-labelled dataset for training and testing machine learning algorithms for the problem of entity level sentiment analysis of political news headlines.

    On-line news portals play a very important role in the information society. Fair media should present reliable and objective information. In practice there is an observable positive or negative bias concerning named entities (e.g. politicians) mentioned in the on-line news headlines. Our dataset consists of 3819 human-labelled political news headlines coming from several major on-line media outlets in English and Polish.

    Each record contains a news headline, a named entity mentioned in the headline and a human annotated label (one of “positive”, “neutral”, “negative” ). Our SEN dataset package consists of 2 parts: SEN-en (English headlines that split into SEN-en-R and SEN-en-AMT), and SEN-pl (Polish headlines). Each headline-entity pair was annotated via team of volunteer researchers (the whole SEN-pl dataset and a subset of 1271 English records: the SEN-en-R subset, “R” for “researchers”) or via the Amazon Mechanical Turk service (a subset of 1360 English records: the SEN-en-AMT subset).

    During analysis of annotation outlying annotations and removed . Separate version of dataset without outliers is marked by "noutliers" in data file name.

    Details of the process of preparing the dataset and presenting its analysis are presented in the paper.

    In case of any questions, please contact one of the authors. Email adresses are in the paper.

  13. o

    Global Political tweets

    • opendatabay.com
    .csv
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Global Political tweets [Dataset]. https://www.opendatabay.com/data/ai-ml/c8d2d199-5c65-401a-8d9d-c88bd5471489
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 16, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    Social media is becoming a key medium through which we communicate with each other: it is at the center of the very structures of our daily interactions. Yet this infiltration is not unique to interpersonal relations. Political leaders, governments, and states operate within this social media environment, wherein they continually address crises and institute damage control through platforms such as Twitter. With the proliferation of the internet into mass masses, social media is emerging as a potential way of communication. It provides a direct channel to politicians for communicating, connecting, and engaging with the public. The power of social media, especially Twitter and Facebook has been proved by its successful application during recent US presidential elections and Arabian countries' revolts. In India too, as the general election is about to knock at the door during early 2014, political parties and leaders are trying to harness the power of social media. Content The tweets have the #Politics hashtag. The collection started on 24/7/2021, and will be updated on a daily basis.

    Information regarding the data The data totally consists of 1 lakh+ records with 13 columns. The description of the features is given below

    No Columns Descriptions 1 user_name The name of the user, as they’ve defined it. 2 user_location The user-defined location for this account’s profile. 3 user_description The user-defined UTF-8 string describing their account. 4 user_created Time and date, when the account was created. 5 user_followers The number of followers an account currently has. 6 user_friends The number of friends an account currently has. 7 user_favourites The number of favorites an account currently has 8 user_verified When true, indicates that the user has a verified account 9 date UTC time and date when the Tweet was created 10 text The actual UTF-8 text of the Tweet 11 hashtags All the other hashtags posted in the tweet along with #Politics 12 source Utility used to post the Tweet, Tweets from the Twitter website have a source value - web 13 is_retweet Indicates whether this Tweet has been Retweeted by the authenticating user. Inspiration You can use this data to dive into the subjects that use this hashtag, look to the geographical distribution, evaluate sentiments, and look at trends.

    License

    CC0

    Original Data Source: Global Political tweets

  14. Appendix 7.3 Data of sentiment analysis on Hong Kong-related WeChat...

    • figshare.com
    txt
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Titus Chen (2021). Appendix 7.3 Data of sentiment analysis on Hong Kong-related WeChat narratives [Dataset]. http://doi.org/10.6084/m9.figshare.14785323.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Titus Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hong Kong
    Description

    This dataset contains the sentiment data of the Hong Kong-related WeChat narratives.

  15. o

    Bangla Dataset on Youtube Political Comments |NLP

    • opendatabay.com
    .undefined
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Bangla Dataset on Youtube Political Comments |NLP [Dataset]. https://www.opendatabay.com/data/ai-ml/7d636bfb-53e1-4631-a3b2-da9976417441
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    YouTube, Social Media and Networking
    Description

    This dataset consists of YouTube comments predominantly collected from political news videos relevant to Bangladesh. The comments are written in Bengali, enriched with emojis that express a range of emotions and opinions. These comments provide unique insights into the public sentiment and reactions related to political events, figures, and policies within the country. This dataset can be highly useful for NLP tasks such as sentiment analysis, emotion detection, and opinion mining. It enables researchers to study public sentiment, emotional expression, and political opinions through text and emojis in Bengali.

    Key Features:

    Language: The comments are in Bengali, reflecting authentic language use with local expressions and cultural nuances.

    Emojis: The presence of emojis in the dataset helps capture non-verbal cues and emotional expressions that add depth to the textual sentiment.

    Context: The data is sourced from videos specifically focused on political news, making it valuable for research related to social, political, and media analysis in Bangladesh.

    License

    CC By 4.0

    Original Data Source: Bangla Dataset on Youtube Political Comments |NLP

  16. H

    Replication data for: Text as Data: The Promise and Pitfalls of Automatic...

    • dataverse.harvard.edu
    Updated Oct 1, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Grimmer; Brandon Stewart (2014). Replication data for: Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts [Dataset]. http://doi.org/10.7910/DVN/FQBHP8
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2014
    Dataset provided by
    Harvard Dataverse
    Authors
    Justin Grimmer; Brandon Stewart
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Replication Materials (Data and Code) for 'Text as Data' Abstract: Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods--they are no substitute for careful thought and close reading and require extensive and problem specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

  17. H

    An new corpus of one million articles from four post-soviet countries and...

    • dataverse.harvard.edu
    Updated Apr 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krzysztof Rybinski (2021). An new corpus of one million articles from four post-soviet countries and Poland. [Dataset]. http://doi.org/10.7910/DVN/CEF7RU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Krzysztof Rybinski
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Soviet Union, Poland
    Description

    Data is in the .Rata format. Should be read into R using the load() function. It contains two R data frames called tx_pl_lang (articles in Polish laniuage) and tx_ru_lang (articles in Russian language). Covered newspapers Table 1. Newspapers and portals included in the analysis and their Alexia ranks Country Newspaper site Number of articles Alexia global rank Alexia local rank Russia iz.ru 43,782 1,378 45 Russia kommersant.ru 46,070 1,335 44 Russia novayagazeta.ru 29,357 9,215 459 Russia vedomosti.ru 27,797 6,302 288 Kazakhstan informburo.kz 29,375 38,916 119 Kazakhstan nur.kz 67,350 951 6 Kazakhstan tengrinews.kz 44,285 13,036 34 Kazakhstan zakon.kz 109,442 9,477 30 Belarus bdg.by 33,447 292,678 746 Belarus belgazeta.by 21,995 1,392,332 11,041 Belarus sb.by 83,685 41,015 79 Ukraine kp.ua 194,792 64,062 860 Ukraine segodnya.ua 45,835 18,658 256 Ukraine vesti.ua 90,559 58,573 1,096 Poland gazeta.pl 53,321 1,749 14 Poland rp.pl 49,587 20,930 167 Poland wpolityce.pl 76,625 13,833 105 Note: Alexia rank, 90-day average, checked on 17 February 2021. Total number of articles 1,047,304. List of columns is the same for both data frames: - short name of the newspaper, - text of the article - date when the article was scraped - sentiment calculated using the standard sentiment lexicons - sentiment calculated using the Covid-extended lexicons - name of the topic - surnames of influential politicians 0/1 variable, 1 if the name is in the article > colnames(tx_pl_lang) [1] "name" "art" "date" "sent" "sent.c" "tname" [7] "putin" "medvedev" "vaino" "shoigu" "bortnikov" "lavrov" [13] "mishustin" "kirienko" "sechin" "zelensky" "shmygal" "akhmetov" [19] "avakov" "ermak" "poroshenko" "medvedchuk" "groisman" "sagyntaev" [25] "mamin" "tokayev" "nnazarbayev" "dnazarbayeva" "kulibayev" "masimov" [31] "alukashenko" "vakulchik" "vlukashenko" "kobyakov" "makei" "myasnikovich" [37] "rumas" "golovchenko" "kaczynski" "duda" "morawiecki" "ziobro" List of LDA topics (tname column) cri crime ind industry itr international trade sov former Soviet republics pol politics tou tourism con construction air air transport ban banking fin finance med media pro protests hou housing spo sport pol politics reg regional mkt financial markets cul culture acc accidents edu education lab labour market war war pub public finances eco economy int international eur europe mob mobile/internet com commodities, oil, gas usa USA fam family hea health his history tra transport pop gossip/beauty/weather rel religion aut automotive spa space misc cannot decide the topic ussr soviet history ukr Ukraine mos Moscow

  18. [Tweets] 2023 Brazilian Early Political Events

    • zenodo.org
    zip
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Raniére Juvino Santos; Lucas Raniére Juvino Santos; Leandro Balby Marinho; Leandro Balby Marinho; Claudio Campelo; Claudio Campelo (2025). [Tweets] 2023 Brazilian Early Political Events [Dataset]. http://doi.org/10.5281/zenodo.14834704
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lucas Raniére Juvino Santos; Lucas Raniére Juvino Santos; Leandro Balby Marinho; Leandro Balby Marinho; Claudio Campelo; Claudio Campelo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2023
    Area covered
    Brazil
    Description

    2023 Brazilian Early Political Events

    This dataset contains 13,910,048 tweets from 1,346,340 users, extracted using 157 search terms over 56 different days between January 1st and June 21st, 2023.

    All tweets in this dataset are in Brazilian Portuguese.

    Data Usage

    The dataset contains textual data from tweets, making it suitable for various NLP analyses, such as sentiment analysis, bias or stance detection, and toxic language detection. Additionally, users and tweets can be linked to create social graphs, enabling Social Network Analysis (SNA) to study polarization, communities, and other social dynamics.

    Extraction Method

    This data set was extracted using Twitter's (now X) official API—when Academic Research API access was still available—following the pipeline:

    1. Twitter/X daily monitoring: The dataset author monitored daily political events appearing in Brazil's Trending Topics. Twitter/X has an automated system for classifying trending terms. When a term was identified as political, it was stored along with its date for later use as a search query.

    2. Tweet collection using saved search terms: Once terms and their corresponding dates were recorded, tweets were extracted from 12:00 AM to 11:59 PM on the day the term entered the Trending Topics. A language filter was applied to select only tweets in Portuguese. The extraction was performed using the official Twitter/X API.

    3. Data storage: The extracted data was organized by day and search term. If the same search term appeared in Trending Topics on consecutive days, a separate file was stored for each respective day.

    Further Information

    For more details, visit:

    - The repository
    - Dataset short paper:

    ---

    DOI: 10.5281/zenodo.14834704
  19. Political Parties Manifestos

    • kaggle.com
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sohail Ahmed (2023). Political Parties Manifestos [Dataset]. https://www.kaggle.com/datasets/sohailds/political-parties-manifestos
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sohail Ahmed
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction:

    This dataset comprises political party manifestos from 19 distinct political parties representing 5 different countries. Each manifesto represents the formal declaration of a political party's aims, principles, and policies, outlining their stance on various issues and topics. These manifestos have been collected and stored in PDF format, accompanied by an index.csv file that provides essential metadata about each manifesto, such as Party Name, Country, Release Year, and File Name.

    Dataset Details:

    Number of Parties: 19 Countries Represented: 5 File Format: PDF (manifestos), CSV (index.csv) Data Fields in index.csv: Party Name: The name of the political party. Country: The country to which the party belongs. Release Year: The year when the manifesto was published or adopted. File Name: The filename of the corresponding PDF manifesto.

    Potential Insights:

    Researchers and analysts can study the manifestos to identify and compare the political ideologies of various parties across countries. Clustering analysis or topic modeling can be applied to group parties with similar ideologies.

    The dataset enables analysts to analyze party stances on specific issues such as healthcare, economy, education, environment, foreign policy, etc. Researchers can identify trends and differences in how parties address these issues.

    Conduct sentiment analysis on the manifestos to gauge the tone and emotional content of the political parties' declarations, providing insights into their communication strategies and emotional appeals to voters.

    By grouping parties based on the country they belong to, analysts can study geopolitical trends and identify similarities and differences in party ideologies within and across nations.

    Creating word clouds from the manifestos can visually represent the most frequently used words, helping to highlight central themes and priorities of each party.

    Researchers can explore the relationship between election outcomes and manifesto contents to analyze the impact of specific policy positions on electoral success.

    Technical Recommendations:

    Since all the relevant files are in .pdf format, following points might be helpful for you to get started:

    1. Extract text from .pdf files using PyMuPDF
    2. Clean extracted text by removing headers and footers using this guideline
  20. P

    Public Opinion Analysis System Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Public Opinion Analysis System Report [Dataset]. https://www.datainsightsmarket.com/reports/public-opinion-analysis-system-1934278
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    May 22, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Public Opinion Analysis System market is experiencing robust growth, projected to reach $3.544 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 17.2% from 2025 to 2033. This expansion is driven by several key factors. Increasing political polarization and the need for real-time sentiment analysis across diverse media channels (social media, news articles, online forums) are creating high demand for sophisticated systems. Furthermore, the growing sophistication of these systems, incorporating AI and machine learning for improved accuracy and efficiency, is accelerating adoption across various sectors, including government, marketing, and public relations. The competitive landscape is fragmented, with companies like Xalted, Knowlesys, Graphen, Surfilter, and several others vying for market share. These companies are constantly innovating, integrating new technologies, and expanding their service offerings to meet evolving client needs and maintain a competitive edge. Geographical expansion, particularly into emerging markets with rapidly growing internet penetration and social media usage, represents a significant opportunity for future growth. However, challenges remain, including data privacy concerns, the potential for biased algorithms, and the need for robust data security measures. Addressing these concerns will be crucial for sustained market growth. The forecast period (2025-2033) anticipates continued growth, driven by further technological advancements, increasing data availability, and broader industry adoption. While the precise regional breakdown is unavailable, it's reasonable to assume that North America and Europe will initially hold significant market share due to established technological infrastructure and higher adoption rates. However, Asia-Pacific is expected to experience rapid growth in the forecast period, driven by increasing internet and smartphone penetration in developing economies. The focus will likely shift towards advanced analytics, predictive modeling, and integration with other business intelligence tools. The market is expected to mature, leading to potential consolidation among market players through mergers and acquisitions. This consolidation will drive innovation and efficiency, while simultaneously potentially impacting pricing strategies within the market.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fatema Tuj Johora Faria (2024). Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis [Dataset]. http://doi.org/10.17632/hdhnrrwdz2.1

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis

Explore at:
Dataset updated
May 13, 2024
Authors
Fatema Tuj Johora Faria
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset "Motamot" containing 7,058 data points labeled with Positive and Negative sentiments, tailored specifically for Political Sentiment Analysis in the Bengali language. The dataset comprises 4,132 instances labeled as Positive and 2,926 instances labeled as Negative sentiments.

Specifics of the Core Data: —------------------------------- Train 5647, Test 706, Validation 705

Train : —-------------------------------

Positive: 3306

Negative: 2341

Test : —-------------------------------

Positive: 413

Negative: 293

Validation : —-------------------------------

Positive: 413

Negative: 292

Search
Clear search
Close search
Google apps
Main menu