100+ datasets found

m
Motamot: A Dataset for Revealing the Supremacy of Large Language Models over...
data.mendeley.com
Updated May 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatema Tuj Johora Faria (2024). Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis [Dataset]. http://doi.org/10.17632/hdhnrrwdz2.1
Explore at:
Unique identifier
https://doi.org/10.17632/hdhnrrwdz2.1
Dataset updated
May 13, 2024
Authors
Fatema Tuj Johora Faria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset "Motamot" containing 7,058 data points labeled with Positive and Negative sentiments, tailored specifically for Political Sentiment Analysis in the Bengali language. The dataset comprises 4,132 instances labeled as Positive and 2,926 instances labeled as Negative sentiments.

Specifics of the Core Data: —------------------------------- Train 5647, Test 706, Validation 705

Train : —-------------------------------

Positive: 3306

Negative: 2341

Test : —-------------------------------

Positive: 413

Negative: 293

Validation : —-------------------------------

Positive: 413

Negative: 292
Political Tweets Dataset
brightdata.com
.json, .csv, .xlsx
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Political Tweets Dataset [Dataset]. https://brightdata.com/products/datasets/twitter/tweets/political
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Political Tweets dataset to enhance campaign strategies and gain insights into public discourse. This dataset offers a comprehensive view of political dynamics on social media, empowering organizations, researchers, and policymakers to analyze trends and sentiment. Access the full dataset or customize it with specific data points tailored to your needs. Popular use cases include: Sentiment Analysis: Analyze publicly available political tweets to understand public sentiment on policies, events, and candidates, aiding campaign strategies and opinion research. Trend Monitoring: Track trending topics and hashtags in political discourse to identify key issues and shifts in public priorities across demographics. Misinformation Detection: Detect and analyze patterns of misinformation, supporting efforts to combat its spread effectively. Harness these insights to stay informed and adapt to the evolving political landscape.
f
Appendix 5.1 Sentiment analysis on WeChat posts
figshare.com
zip
Updated Jul 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Titus Chen (2021). Appendix 5.1 Sentiment analysis on WeChat posts [Dataset]. http://doi.org/10.6084/m9.figshare.12738164.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12738164.v3
Dataset updated
Jul 17, 2021
Dataset provided by
figshare
Authors
Titus Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
id: unique id number of WeChat postAccount: WeChat public accounttime: date of post publicationtitle: title of WeChat postcontent: content of WeChat postmedia: number of multimedia objects in WeChat postURL: permanent link of WeChat postsegcon_unlist: tokenized content of WeChat postmonth: month of post publicationyear: year of post publicationAccount_e: English name of WeChat public accounttoken_count: number of tokens in WeChat postsenti_count: number of sentiment terms in WeChat postsenti_str: Sentiment terms in WeChat postNB: Weighted number of sadness terms (悲伤) in WeChat postNC: Weighted number of fear terms (恐惧) in WeChat postND: Weighted number of antipathy terms (憎恶) in WeChat postNE: Weighted number of distraught terms (烦闷) in WeChat postNG: Weighted number of shame/shyness terms (羞) in WeChat postNH: Weighted number of guilt terms (疚) in WeChat postNI: Weighted number of panic terms (慌) in WeChat postNJ: Weighted number of disappointment terms (失望) in WeChat postNK: Weighted number of jealousy terms (妒忌) in WeChat postNL: Weighted number of skepticism terms (怀疑) in WeChat postNN: Weighted number of denigration terms (贬责) in WeChat postPA: Weighted number of happiness terms (快乐) in WeChat postPB: Weighted number of fondness terms (喜爱) in WeChat postPC: Weighted number of surprise terms (惊奇) in WeChat postPD: Weighted number of respect terms in (尊敬) WeChat postPE: Weighted number of peacefulness terms (安心) in WeChat postPF: Weighted number of anxiety terms (思) in WeChat postPG: Weighted number of trust terms (相信) in WeChat postPH: Weighted number of commendation/approval terms (赞扬) in WeChat postPK: Weighted number of wishfulness terms (祝愿) in WeChat postNAs: Weighted number of anger terms (愤怒) in WeChat postposi: Weighted number of positive sentiment terms in WeChat postnega: Weighted number of negative sentiment terms in WeChat postall_senti: Weighted number of all sentiment terms in WeChat postall_senti_norm: Standardized weighted number of all sentiment terms in WeChat postofficial: Official affiliation of WeChat post (Official media or Non-official media)
Twitter Trends|| PPP &PTI || Pakistan Elections
kaggle.com
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aqeelkh (2024). Twitter Trends|| PPP &PTI || Pakistan Elections [Dataset]. https://www.kaggle.com/datasets/aqeelkh/twitter-trends-ppp-and-pti-pakistan-elections/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aqeelkh
Area covered
Pakistan
Description
Dataset Title: PPP and PTI Twitter Trend Analysis

Overview This dataset encompasses a collection of 1184 tweets from the Twitter trend "PPP and PTI," capturing a snapshot of public discourse and sentiment regarding Pakistan's prominent political entities: the Pakistan Peoples Party (PPP) and Pakistan Tehreek-e-Insaf (PTI). It provides a diverse range of perspectives and reactions from Twitter users, making it an invaluable resource for political analysts, data scientists, and researchers interested in political sentiment analysis, social media analytics, and digital humanities.

Dataset Description The dataset is structured into seven columns, each offering distinct insights into the tweets collected:

UserTag: The Twitter handle of the user who posted the tweet.

TimeStamp: The date and time when the tweet was posted, providing temporal context to the data.

Current_Date: The date when the data was collected, ensuring traceability and relevance. -**Tweet Body**: The actual content of the tweet, encapsulating the message, sentiment, and topics discussed by the user. This column is central to text analysis, sentiment detection, and thematic studies.

Reply: The number of replies to the tweet, indicating engagement and conversational depth.

Retweet: The number of retweets, reflecting the tweet's reach and virality within the Twitter community.

Likes: The number of likes, serving as a proxy for the tweet's popularity and user agreement.

Views: An estimate of how many times the tweet was viewed, offering insights into its impact and visibility.

Potential Uses This dataset can serve a wide range of purposes, including but not limited to: 1. Sentiment analysis to gauge public opinion regarding PPP and PTI. 2. Temporal analysis to identify trends and shifts in public sentiment over time. 3. Network analysis to explore interactions and the spread of information among users. 4. Comparative analysis between the engagement and popularity of tweets related to PPP vs. PTI.

Methodology The tweets were collected using Selenium WebDriver, ensuring a comprehensive and unbiased selection of tweets related to the "PPP and PTI" trend. Care was taken to include tweets from various times of the day to capture a broad spectrum of user engagement and opinions.

Ethical Considerations All data was collected and presented in accordance with Twitter's data use policies and ethical guidelines for research.

Acknowledgments This dataset was created by Aqeel Khan, a student of BS Mathematics at Namal University Mianwali, with a keen interest in data science and machine learning. The dataset compilation was aimed at facilitating research and analysis in the domains of political science, social media analytics, and data science.

License This dataset is shared for educational and research purposes. Users of the dataset are encouraged to cite the source and adhere to Twitter's terms of service regarding the use of shared data.

Twitter Sentiment Analysis Datasets

brightdata.com

.json, .csv, .xlsx

Updated Dec 24, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2024). Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Dec 24, 2024

Dataset authored and provided by

Bright Data

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

Key Features:

  Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
  Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
  Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
  Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
  Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
  Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.


Use Cases:

  Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
  Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
  Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
  AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
  Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.



  Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
  Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.

Appendix 6.3 Data of sentiment analysis on US-China trade disputes
figshare.com
txt
Updated Jul 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Titus Chen (2021). Appendix 6.3 Data of sentiment analysis on US-China trade disputes [Dataset]. http://doi.org/10.6084/m9.figshare.14601738.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14601738.v1
Dataset updated
Jul 20, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Titus Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China, United States
Description
This folder deposits two datasets. The first dataset contains the sentiment data of the tariff-related WeChat posts. The other dataset keeps the sentiment data of the Huawei-related WeChat posts.
E
Data from: PolSentiLex: Sentiment Detection in Socio-Political Discussions...
live.european-language-grid.eu
explore.openaire.eu
+2more
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7654
Explore at:
csvAvailable download formats
Dataset updated
Dec 10, 2023
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A Russian-language sentiment lexicon for social media discussions on political and social issues.
The file contains raw markings collected with LINIS coding service https://linis-crowd.org [in Russian].
Learn more about PolSentiLex in our papers:
Koltsova, O., & Alexeeva, S. (2015). Linis-crowd.org: A lexical resource for Russian sentiment analysis of social media [Linis-crowd.org: Lexichesk resurs dl’a analiza tonal’nosti sotsial’no-politicheskix tekstov]. Computational Linguis- Tics and Computantional Ontologies: Proceedings of the XVIII Joint Conference “Internet and Modern Society (IMS-2015)” [Kompyuternaya Lingvistika i Vyichis- Litelnyie Ontologii: Sbornik Nauchnyih Statey. Trudyi XVIII Ob’edinennoy Konferen- Tsii «Internet i Sovremennoe Obschestvo» (IMS-2015)], 25–34. [in Russian] URL: https://scila.hse.ru/data/2020/06/02/1603986481/koltsovaoyuetal.pdf
Koltsova, O., Alexeeva, S., & Koltsov, S. (2016). An Opinion Word Lexicon and a Training Dataset for Russian Sentiment Analysis of Social Media. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2016”, 277–287. URL: http://www.dialog-21.ru/media/3400/koltsovaoyuetal.pdf
Koltsova O., Alexeeva S., Pashakhin S., Koltsov S. (2020) PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media. In: Filchenkov A., Kauttonen J., Pivovarova L. (eds) Artificial Intelligence and Natural Language. AINL 2020. Communications in Computer and Information Science, vol 1292. Springer, Cham. https://doi.org/10.1007/978-3-030-59082-6_1
m
Bangla Sentiment Dataset
data.mendeley.com
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jahanur Biswas (2025). Bangla Sentiment Dataset [Dataset]. http://doi.org/10.17632/rh67mckhbh.2
Explore at:
Unique identifier
https://doi.org/10.17632/rh67mckhbh.2
Dataset updated
Jun 3, 2025
Authors
Jahanur Biswas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Bangla Sentiment Dataset is a curated collection of sentiment-rich textual data in Bangla, focused on recent and trending topics. This dataset has been compiled from diverse sources, including Bangladeshi online newspapers, social media platforms, and blogs, ensuring a wide spectrum of language styles and sentiment expressions.

Key Features: Focus on Recent Topics: The dataset emphasizes contemporary issues, trending discussions, and popular topics in Bangladeshi society. This includes sentiments on political developments, social movements, entertainment, cultural events, and other recent happenings.

Source Variety:

Online Newspapers: Articles, editorials, headlines, and reader comments provide structured and semi-formal sentiment data. Social Media: Posts, tweets, and comments reflect informal, conversational language with high emotional expressiveness. Blogs: Opinion pieces and discussions offer detailed and context-rich sentiment content. Sentiment Labels: Each entry in the dataset is annotated with one of the following sentiment categories:

Positive (1): Texts expressing happiness, agreement, or optimism. Negative (0): Texts reflecting criticism, disagreement, or pessimism. Neutral (2): Texts presenting balanced or factual statements with minimal emotional bias. Linguistic and Stylistic Diversity: The dataset captures a range of Bangla language variations, including:

Formal and informal Bangla usage. Regional dialects. Transliterated Bangla (Banglish) commonly used on social media. Real-World Context: The inclusion of recent topics ensures that the dataset is relevant for analyzing public sentiment around current events and trends. This makes it particularly useful for real-time sentiment analysis applications.

This dataset provides an invaluable resource for researchers and practitioners aiming to explore sentiment analysis in Bangla, with a special emphasis on modern-day relevance and real-world applicability.
Data from: Migration Sentiment Analysis Dataset from Portuguese Political...
zenodo.org
investigacion.usc.gal
bin
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Bran Marino; Erik Bran Marino; Renata Vieira; Renata Vieira; Suso Baleato; Suso Baleato; Ana Sofia Ribeiro; Ana Sofia Ribeiro; Katarina Laken; Katarina Laken (2025). Migration Sentiment Analysis Dataset from Portuguese Political Manifestos (2011, 2015, 2019) [Dataset]. http://doi.org/10.5281/zenodo.15189809
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15189809
Dataset updated
Apr 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Erik Bran Marino; Erik Bran Marino; Renata Vieira; Renata Vieira; Suso Baleato; Suso Baleato; Ana Sofia Ribeiro; Ana Sofia Ribeiro; Katarina Laken; Katarina Laken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of migration-related sentences extracted from Portuguese political party manifestos from the 2011, 2015, and 2019 legislative elections. Each entry includes the original sentence in Portuguese, sentiment analysis scores (positive, negative, and neutral probabilities), and the migration-related term that appears in the sentence. The sentiment analysis was performed using a multilingual BERT model trained for sentiment classification.

The dataset was created as part of a research project examining how political discourse around migration has evolved in Portugal's changing political landscape, particularly with the emergence of new parties. This resource supports computational analysis of political communication regarding migration issues in Portugal.
m
Tweets on Political and Social issues for analysis using Neutrosophic Sets
data.mendeley.com
Updated Jan 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilanthenral Kandasamy (2019). Tweets on Political and Social issues for analysis using Neutrosophic Sets [Dataset]. http://doi.org/10.17632/fnzmfgy2bd.1
Explore at:
Unique identifier
https://doi.org/10.17632/fnzmfgy2bd.1
Dataset updated
Jan 27, 2019
Authors
Ilanthenral Kandasamy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ten different topics were taken for analysis. 1000 tweets about each topic was collected for analysis. The file here as a sample case; contains the tweets extracted about Farm Loan along with polarity calculated by TextBlob (Python).
A
Training Data for German Sentiment Analysis of Political Communication (SUF...
data.aussda.at
bin, pdf, tsv
Updated Nov 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Haselmayer; Martin Haselmayer; Marcelo Jenny; Marcelo Jenny (2020). Training Data for German Sentiment Analysis of Political Communication (SUF edition) [Dataset]. http://doi.org/10.11587/EOPCOB
Explore at:
pdf(86728), tsv(448), tsv(23371430), bin(9134185)Available download formats
Unique identifier
https://doi.org/10.11587/EOPCOB
Dataset updated
Nov 26, 2020
Dataset provided by
AUSSDA
Authors
Martin Haselmayer; Martin Haselmayer; Marcelo Jenny; Marcelo Jenny
License
https://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/EOPCOBhttps://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/EOPCOB
Area covered
Austria
Dataset funded by
Austrian Science Fund
Vienna Anniversary Foundation for Higher Education
Description
Full edition for scientific use. The dataset contains 125871 sentences extracted from Austrian parliamentary debates and party press releases. Press releases were collected under the auspices of the Austrian National Election Study (AUTNES) and cover 6 weeks prior to each national election 1995-2013. Data from parliamentary debates stem from a random sample of sentences drawn from sessions of the Austrian National Council (1995-2013). The sentiment of the sentences was crowdcoded on a five-point-scale ranging from 0 “Not negative” to 5 “Very strongly negative”. As each sentence has been coded by ten coders, there are multiple codingids for each unitid (sentence).
Z
SEN - Sentiment analysis of Entities in News headlines
data.niaid.nih.gov
zenodo.org
Updated Oct 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katarzyna Baraniak (2023). SEN - Sentiment analysis of Entities in News headlines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5211931
Explore at:
Dataset updated
Oct 15, 2023
Dataset provided by
Katarzyna Baraniak
Marcin Sydow
Description
If you wish to use this data please cite:

Katarzyna Baraniak, Marcin Sydow, A dataset for Sentiment analysis of Entities in News headlines (SEN), Procedia Computer Science, Volume 192, 2021, Pages 3627-3636, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2021.09.136. (https://www.sciencedirect.com/science/article/pii/S1877050921018755)

bibtex: users.pja.edu.pl/~msyd/bibtex/sydow-baraniak-SENdataset-kes21.bib

SEN is a novel publicly available human-labelled dataset for training and testing machine learning algorithms for the problem of entity level sentiment analysis of political news headlines.

On-line news portals play a very important role in the information society. Fair media should present reliable and objective information. In practice there is an observable positive or negative bias concerning named entities (e.g. politicians) mentioned in the on-line news headlines. Our dataset consists of 3819 human-labelled political news headlines coming from several major on-line media outlets in English and Polish.

Each record contains a news headline, a named entity mentioned in the headline and a human annotated label (one of “positive”, “neutral”, “negative” ). Our SEN dataset package consists of 2 parts: SEN-en (English headlines that split into SEN-en-R and SEN-en-AMT), and SEN-pl (Polish headlines). Each headline-entity pair was annotated via team of volunteer researchers (the whole SEN-pl dataset and a subset of 1271 English records: the SEN-en-R subset, “R” for “researchers”) or via the Amazon Mechanical Turk service (a subset of 1360 English records: the SEN-en-AMT subset).

During analysis of annotation outlying annotations and removed . Separate version of dataset without outliers is marked by "noutliers" in data file name.

Details of the process of preparing the dataset and presenting its analysis are presented in the paper.

In case of any questions, please contact one of the authors. Email adresses are in the paper.
o
Global Political tweets
opendatabay.com
.csv
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Global Political tweets [Dataset]. https://www.opendatabay.com/data/ai-ml/c8d2d199-5c65-401a-8d9d-c88bd5471489
Explore at:
.csvAvailable download formats
Dataset updated
Jun 16, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
Social media is becoming a key medium through which we communicate with each other: it is at the center of the very structures of our daily interactions. Yet this infiltration is not unique to interpersonal relations. Political leaders, governments, and states operate within this social media environment, wherein they continually address crises and institute damage control through platforms such as Twitter. With the proliferation of the internet into mass masses, social media is emerging as a potential way of communication. It provides a direct channel to politicians for communicating, connecting, and engaging with the public. The power of social media, especially Twitter and Facebook has been proved by its successful application during recent US presidential elections and Arabian countries' revolts. In India too, as the general election is about to knock at the door during early 2014, political parties and leaders are trying to harness the power of social media. Content The tweets have the #Politics hashtag. The collection started on 24/7/2021, and will be updated on a daily basis.

Information regarding the data The data totally consists of 1 lakh+ records with 13 columns. The description of the features is given below

No Columns Descriptions 1 user_name The name of the user, as they’ve defined it. 2 user_location The user-defined location for this account’s profile. 3 user_description The user-defined UTF-8 string describing their account. 4 user_created Time and date, when the account was created. 5 user_followers The number of followers an account currently has. 6 user_friends The number of friends an account currently has. 7 user_favourites The number of favorites an account currently has 8 user_verified When true, indicates that the user has a verified account 9 date UTC time and date when the Tweet was created 10 text The actual UTF-8 text of the Tweet 11 hashtags All the other hashtags posted in the tweet along with #Politics 12 source Utility used to post the Tweet, Tweets from the Twitter website have a source value - web 13 is_retweet Indicates whether this Tweet has been Retweeted by the authenticating user. Inspiration You can use this data to dive into the subjects that use this hashtag, look to the geographical distribution, evaluate sentiments, and look at trends.

License

CC0

Original Data Source: Global Political tweets
Appendix 7.3 Data of sentiment analysis on Hong Kong-related WeChat...
figshare.com
txt
Updated Jul 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Titus Chen (2021). Appendix 7.3 Data of sentiment analysis on Hong Kong-related WeChat narratives [Dataset]. http://doi.org/10.6084/m9.figshare.14785323.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14785323.v2
Dataset updated
Jul 22, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Titus Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Hong Kong
Description
This dataset contains the sentiment data of the Hong Kong-related WeChat narratives.
o
Bangla Dataset on Youtube Political Comments |NLP
opendatabay.com
.undefined
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Bangla Dataset on Youtube Political Comments |NLP [Dataset]. https://www.opendatabay.com/data/ai-ml/7d636bfb-53e1-4631-a3b2-da9976417441
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Datasimple
Area covered
YouTube, Social Media and Networking
Description
This dataset consists of YouTube comments predominantly collected from political news videos relevant to Bangladesh. The comments are written in Bengali, enriched with emojis that express a range of emotions and opinions. These comments provide unique insights into the public sentiment and reactions related to political events, figures, and policies within the country. This dataset can be highly useful for NLP tasks such as sentiment analysis, emotion detection, and opinion mining. It enables researchers to study public sentiment, emotional expression, and political opinions through text and emojis in Bengali.

Key Features:

Language: The comments are in Bengali, reflecting authentic language use with local expressions and cultural nuances.

Emojis: The presence of emojis in the dataset helps capture non-verbal cues and emotional expressions that add depth to the textual sentiment.

Context: The data is sourced from videos specifically focused on political news, making it valuable for research related to social, political, and media analysis in Bangladesh.

License

CC By 4.0

Original Data Source: Bangla Dataset on Youtube Political Comments |NLP
H
Replication data for: Text as Data: The Promise and Pitfalls of Automatic...
dataverse.harvard.edu
Updated Oct 1, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Justin Grimmer; Brandon Stewart (2014). Replication data for: Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts [Dataset]. http://doi.org/10.7910/DVN/FQBHP8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FQBHP8
Dataset updated
Oct 1, 2014
Dataset provided by
Harvard Dataverse
Authors
Justin Grimmer; Brandon Stewart
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
Replication Materials (Data and Code) for 'Text as Data' Abstract: Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods--they are no substitute for careful thought and close reading and require extensive and problem specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.
H
An new corpus of one million articles from four post-soviet countries and...
dataverse.harvard.edu
Updated Apr 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krzysztof Rybinski (2021). An new corpus of one million articles from four post-soviet countries and Poland. [Dataset]. http://doi.org/10.7910/DVN/CEF7RU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CEF7RU
Dataset updated
Apr 28, 2021
Dataset provided by
Harvard Dataverse
Authors
Krzysztof Rybinski
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Soviet Union, Poland
Description
Data is in the .Rata format. Should be read into R using the load() function. It contains two R data frames called tx_pl_lang (articles in Polish laniuage) and tx_ru_lang (articles in Russian language). Covered newspapers Table 1. Newspapers and portals included in the analysis and their Alexia ranks Country Newspaper site Number of articles Alexia global rank Alexia local rank Russia iz.ru 43,782 1,378 45 Russia kommersant.ru 46,070 1,335 44 Russia novayagazeta.ru 29,357 9,215 459 Russia vedomosti.ru 27,797 6,302 288 Kazakhstan informburo.kz 29,375 38,916 119 Kazakhstan nur.kz 67,350 951 6 Kazakhstan tengrinews.kz 44,285 13,036 34 Kazakhstan zakon.kz 109,442 9,477 30 Belarus bdg.by 33,447 292,678 746 Belarus belgazeta.by 21,995 1,392,332 11,041 Belarus sb.by 83,685 41,015 79 Ukraine kp.ua 194,792 64,062 860 Ukraine segodnya.ua 45,835 18,658 256 Ukraine vesti.ua 90,559 58,573 1,096 Poland gazeta.pl 53,321 1,749 14 Poland rp.pl 49,587 20,930 167 Poland wpolityce.pl 76,625 13,833 105 Note: Alexia rank, 90-day average, checked on 17 February 2021. Total number of articles 1,047,304. List of columns is the same for both data frames: - short name of the newspaper, - text of the article - date when the article was scraped - sentiment calculated using the standard sentiment lexicons - sentiment calculated using the Covid-extended lexicons - name of the topic - surnames of influential politicians 0/1 variable, 1 if the name is in the article > colnames(tx_pl_lang) [1] "name" "art" "date" "sent" "sent.c" "tname" [7] "putin" "medvedev" "vaino" "shoigu" "bortnikov" "lavrov" [13] "mishustin" "kirienko" "sechin" "zelensky" "shmygal" "akhmetov" [19] "avakov" "ermak" "poroshenko" "medvedchuk" "groisman" "sagyntaev" [25] "mamin" "tokayev" "nnazarbayev" "dnazarbayeva" "kulibayev" "masimov" [31] "alukashenko" "vakulchik" "vlukashenko" "kobyakov" "makei" "myasnikovich" [37] "rumas" "golovchenko" "kaczynski" "duda" "morawiecki" "ziobro" List of LDA topics (tname column) cri crime ind industry itr international trade sov former Soviet republics pol politics tou tourism con construction air air transport ban banking fin finance med media pro protests hou housing spo sport pol politics reg regional mkt financial markets cul culture acc accidents edu education lab labour market war war pub public finances eco economy int international eur europe mob mobile/internet com commodities, oil, gas usa USA fam family hea health his history tra transport pop gossip/beauty/weather rel religion aut automotive spa space misc cannot decide the topic ussr soviet history ukr Ukraine mos Moscow
[Tweets] 2023 Brazilian Early Political Events
zenodo.org
zip
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Raniére Juvino Santos; Lucas Raniére Juvino Santos; Leandro Balby Marinho; Leandro Balby Marinho; Claudio Campelo; Claudio Campelo (2025). [Tweets] 2023 Brazilian Early Political Events [Dataset]. http://doi.org/10.5281/zenodo.14834704
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14834704
Dataset updated
Feb 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lucas Raniére Juvino Santos; Lucas Raniére Juvino Santos; Leandro Balby Marinho; Leandro Balby Marinho; Claudio Campelo; Claudio Campelo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2023
Area covered
Brazil
Description
2023 Brazilian Early Political Events

This dataset contains 13,910,048 tweets from 1,346,340 users, extracted using 157 search terms over 56 different days between January 1st and June 21st, 2023.

All tweets in this dataset are in Brazilian Portuguese.

Data Usage

The dataset contains textual data from tweets, making it suitable for various NLP analyses, such as sentiment analysis, bias or stance detection, and toxic language detection. Additionally, users and tweets can be linked to create social graphs, enabling Social Network Analysis (SNA) to study polarization, communities, and other social dynamics.

Extraction Method

This data set was extracted using Twitter's (now X) official API—when Academic Research API access was still available—following the pipeline:

1. Twitter/X daily monitoring: The dataset author monitored daily political events appearing in Brazil's Trending Topics. Twitter/X has an automated system for classifying trending terms. When a term was identified as political, it was stored along with its date for later use as a search query.

2. Tweet collection using saved search terms: Once terms and their corresponding dates were recorded, tweets were extracted from 12:00 AM to 11:59 PM on the day the term entered the Trending Topics. A language filter was applied to select only tweets in Portuguese. The extraction was performed using the official Twitter/X API.

3. Data storage: The extracted data was organized by day and search term. If the same search term appeared in Trending Topics on consecutive days, a separate file was stored for each respective day.

Further Information

For more details, visit:

- The repository
- Dataset short paper:

---

DOI: 10.5281/zenodo.14834704
Political Parties Manifestos
kaggle.com
Updated Jul 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sohail Ahmed (2023). Political Parties Manifestos [Dataset]. https://www.kaggle.com/datasets/sohailds/political-parties-manifestos
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sohail Ahmed
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introduction:

This dataset comprises political party manifestos from 19 distinct political parties representing 5 different countries. Each manifesto represents the formal declaration of a political party's aims, principles, and policies, outlining their stance on various issues and topics. These manifestos have been collected and stored in PDF format, accompanied by an index.csv file that provides essential metadata about each manifesto, such as Party Name, Country, Release Year, and File Name.

Dataset Details:

Number of Parties: 19 Countries Represented: 5 File Format: PDF (manifestos), CSV (index.csv) Data Fields in index.csv: Party Name: The name of the political party. Country: The country to which the party belongs. Release Year: The year when the manifesto was published or adopted. File Name: The filename of the corresponding PDF manifesto.

Potential Insights:

Researchers and analysts can study the manifestos to identify and compare the political ideologies of various parties across countries. Clustering analysis or topic modeling can be applied to group parties with similar ideologies.

The dataset enables analysts to analyze party stances on specific issues such as healthcare, economy, education, environment, foreign policy, etc. Researchers can identify trends and differences in how parties address these issues.

Conduct sentiment analysis on the manifestos to gauge the tone and emotional content of the political parties' declarations, providing insights into their communication strategies and emotional appeals to voters.

By grouping parties based on the country they belong to, analysts can study geopolitical trends and identify similarities and differences in party ideologies within and across nations.

Creating word clouds from the manifestos can visually represent the most frequently used words, helping to highlight central themes and priorities of each party.

Researchers can explore the relationship between election outcomes and manifesto contents to analyze the impact of specific policy positions on electoral success.

Technical Recommendations:

Since all the relevant files are in .pdf format, following points might be helpful for you to get started:

Extract text from .pdf files using PyMuPDF

Clean extracted text by removing headers and footers using this guideline
P
Public Opinion Analysis System Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Public Opinion Analysis System Report [Dataset]. https://www.datainsightsmarket.com/reports/public-opinion-analysis-system-1934278
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
May 22, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Public Opinion Analysis System market is experiencing robust growth, projected to reach $3.544 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 17.2% from 2025 to 2033. This expansion is driven by several key factors. Increasing political polarization and the need for real-time sentiment analysis across diverse media channels (social media, news articles, online forums) are creating high demand for sophisticated systems. Furthermore, the growing sophistication of these systems, incorporating AI and machine learning for improved accuracy and efficiency, is accelerating adoption across various sectors, including government, marketing, and public relations. The competitive landscape is fragmented, with companies like Xalted, Knowlesys, Graphen, Surfilter, and several others vying for market share. These companies are constantly innovating, integrating new technologies, and expanding their service offerings to meet evolving client needs and maintain a competitive edge. Geographical expansion, particularly into emerging markets with rapidly growing internet penetration and social media usage, represents a significant opportunity for future growth. However, challenges remain, including data privacy concerns, the potential for biased algorithms, and the need for robust data security measures. Addressing these concerns will be crucial for sustained market growth. The forecast period (2025-2033) anticipates continued growth, driven by further technological advancements, increasing data availability, and broader industry adoption. While the precise regional breakdown is unavailable, it's reasonable to assume that North America and Europe will initially hold significant market share due to established technological infrastructure and higher adoption rates. However, Asia-Pacific is expected to experience rapid growth in the forecast period, driven by increasing internet and smartphone penetration in developing economies. The focus will likely shift towards advanced analytics, predictive modeling, and integration with other business intelligence tools. The market is expected to mature, leading to potential consolidation among market players through mergers and acquisitions. This consolidation will drive innovation and efficiency, while simultaneously potentially impacting pricing strategies within the market.

Facebook

Twitter

Click to copy link

Link copied

Cite

Fatema Tuj Johora Faria (2024). Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis [Dataset]. http://doi.org/10.17632/hdhnrrwdz2.1

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis

Explore at:

Unique identifier

https://doi.org/10.17632/hdhnrrwdz2.1

Dataset updated

May 13, 2024

Authors

Fatema Tuj Johora Faria

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset "Motamot" containing 7,058 data points labeled with Positive and Negative sentiments, tailored specifically for Political Sentiment Analysis in the Bengali language. The dataset comprises 4,132 instances labeled as Positive and 2,926 instances labeled as Negative sentiments.

Specifics of the Core Data: —------------------------------- Train 5647, Test 706, Validation 705

Train : —-------------------------------

Positive: 3306

Negative: 2341

Test : —-------------------------------

Positive: 413

Negative: 293

Validation : —-------------------------------

Positive: 413

Negative: 292

Clear search

Close search

Google apps

Main menu

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over...

Political Tweets Dataset

Appendix 5.1 Sentiment analysis on WeChat posts

Twitter Trends|| PPP &PTI || Pakistan Elections

Twitter Sentiment Analysis Datasets

Appendix 6.3 Data of sentiment analysis on US-China trade disputes

Data from: PolSentiLex: Sentiment Detection in Socio-Political Discussions...

Bangla Sentiment Dataset

Data from: Migration Sentiment Analysis Dataset from Portuguese Political...

Tweets on Political and Social issues for analysis using Neutrosophic Sets

Training Data for German Sentiment Analysis of Political Communication (SUF...

SEN - Sentiment analysis of Entities in News headlines

Global Political tweets

License

Appendix 7.3 Data of sentiment analysis on Hong Kong-related WeChat...

Bangla Dataset on Youtube Political Comments |NLP

License

Replication data for: Text as Data: The Promise and Pitfalls of Automatic...

An new corpus of one million articles from four post-soviet countries and...

[Tweets] 2023 Brazilian Early Political Events

2023 Brazilian Early Political Events

Data Usage

Extraction Method

Further Information

Political Parties Manifestos

Public Opinion Analysis System Report

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis