https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://techcrunch.com/wp-content/uploads/2015/10/twitter-politics.png" alt="">
Social media is becoming a key medium through which we communicate with each other: it is at the center of the very structures of our daily interactions. Yet this infiltration is not unique to interpersonal relations. Political leaders, governments, and states operate within this social media environment, wherein they continually address crises and institute damage control through platforms such as Twitter.
With the proliferation of the internet into mass masses, social media is emerging as a potential way of communication. It provides a direct channel to politicians for communicating, connecting, and engaging with the public. The power of social media, especially Twitter and Facebook has been proved by its successful application during recent US presidential elections and Arabian countries' revolts. In India too, as the general election is about to knock at the door during early 2014, political parties and leaders are trying to harness the power of social media.
The tweets have the #Politics hashtag. The collection started on 24/7/2021, and will be updated on a daily basis.
The data totally consists of 1 lakh+ records with 13 columns. The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #Politics | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
You can use this data to dive into the subjects that use this hashtag, look to the geographical distribution, evaluate sentiments, and look at trends.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the Twitter Parliamentarian Database: a database consisting of parliamentarian names, parties and twitter ids from the following countries: Austria, Belgium, France, Denmark, Spain, Finland, Germany, Greece, Italy, Malta, Poland, Netherlands, United Kingdom, Ireland, Sweden, New Zealand, Turkey, United States, Canada, Australia, Iceland, Norway, Switzerland, Luxembourg, Latvia and Slovenia. In addition, the database includes the European Parliament.The tweet ids from the politicans' tweets have been collected from September 2017 - 31 October 2019 (all_tweet_ids.csv). In compliance with Twitter's policy, we only store tweet ids, which can be re-hydrated into full tweets using existing tools. More information on how to use the database can be found in the readme.txt.It is recommended that you use the .csv files to work with the data, rather than the SQL tables. Information on the relations in the SQL database can be found in the Database codebook.pdf.Update:The tweet ids for 2021 have been added as '2021.csv'Update #2:The tweet ids for 2020 have been added as '2020.csv'The last party table has been added as 'parties_2021_04_28.csv'The last members table has been added as 'members_2021_04_28.csv'
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset captures a vast collection of social media discourse related to global politics. It highlights how social media has become a crucial medium for communication, especially for political leaders, governments, and states engaging with the public. The data illustrates the role of platforms like Twitter in addressing crises and managing public perception. The power of social media, particularly Twitter, has been demonstrated in significant political events such as recent US presidential elections and revolts in Arabian countries. The collection focuses on tweets containing the #Politics hashtag, with daily updates ensuring relevance and recency.
The dataset is typically provided in a CSV format. It contains over 100,000 records, specifically, the collection contains 238,527 unique values within the observed date ranges. Each record includes 13 distinct columns detailing various aspects of the political tweets and their originators. Specific numbers for rows/records are available from the detailed time series counts, reflecting data points ranging from hundreds to tens of thousands per period.
This dataset is ideal for exploratory data analysis, allowing users to dive into the subjects associated with the #Politics hashtag. It can be used to analyse geographical distribution of political discourse, evaluate sentiments expressed in tweets, and identify emerging trends in political conversations on social media. Researchers and analysts can gain insights into public opinion, political communication strategies, and the impact of social media on political landscapes.
The dataset's geographic scope is global. The current collection began on 24th July 2021 and is updated on a daily basis. Data spans from this start date up to 21st August 2022 based on current observations. Some historical aggregations of #Politics tweets are available for periods as early as 14th July 2006. No specific notes on data availability for certain demographic groups are provided, but the user and tweet metadata allows for some inferred demographic analysis.
CC0
This dataset is valuable for a wide range of users, including: * Political scientists and researchers studying political communication and social media's impact on public discourse. * Data scientists and analysts keen on performing social media analysis, including sentiment analysis and topic modelling. * Natural Language Processing (NLP) practitioners developing models for text classification, entity recognition, or language understanding in a political context. * Organisations and individuals interested in monitoring political trends and public engagement on social media platforms.
Original Data Source: Global Political tweets
Dataset based on Twitter usernames of American politicians. Data extracted from Wikidata.
The same politician can appear several times: if he has different pseudonyms on Twitter or Instagram, if he has been in several parties, or if several Twitter account IDs are associated with him. But the data is sorted in ascending order by name, so it is visible
Governments may have the capacity to flood social media with fake news, but little is known about the use of flooding by ordinary voters. In this work, we identify 2107 registered US voters that account for 80% of the fake news shared on Twitter during the 2020 US presidential election by an entire panel of 664,391 voters. We find that supersharers are important members of the network, reaching a sizable 5.2% of registered voters on the platform. Supersharers have a significant overrepresentation of women, older adults, and registered Republicans. Supersharers' massive volume does not seem automated but is rather generated through manual and persistent retweeting. These findings highlight a vulnerability of social media for democracy, where a small group of people distort the political reality for many., This dataset contains aggregated information necessary to replicate the results reported in our work on Supersharers of Fake News on Twitter while respecting and preserving the privacy expectations of individuals included in the analysis. No individual-level data is provided as part of this dataset. The data collection process that enabled the creation of this dataset leveraged a large-scale panel of registered U.S. voters matched to Twitter accounts. We examined the activity of 664,391 panel members who were active on Twitter during the months of the 2020 U.S. presidential election (August to November 2020, inclusive), and identified a subset of 2,107 supersharers, which are the most prolific sharers of fake news in the panel that together account for 80% of fake news content shared on the platform. We rely on a source-level definition of fake news, that uses the manually-labeled list of fake news sites by Grinberg et al. 2019 and an updated list based on NewsGuard ratings (commercial..., , # Supersharers of Fake News on Twitter
This repository contains data and code for replication of the results presented in the paper.
The folders are mostly organized by research questions as detailed below. Each folder contains the code and publicly available data necessary for the replication of results. Importantly, no individual-level data is provided as part of this repository. De-identified individual-level data can be attained for IRB-approved uses under the terms and conditions specified in the paper. Once access is granted, the restricted-access data is expected to be located under ./restricted_data
.
The folders in this repository are the following:
Code under the preprocessing
folder contains the following:
Tweet IDs from political Twitter, March 2020 This dataset contains the Twitter tweet IDs used in a study investigating the extent to which problematic information is present in the most engaged-with content in political and issue spaces on Twitter in the run-up to the 2020 US elections. These tweets were returned from running in DMI-TCAT a curated list of queries for political candidates, political parties and social issues, incorporating politician-specific, party-specific and issue-specific keywords and hashtags. The shared URLs dataset was collected during a three-week timeframe (March 2-22, 2020, or around Super Tuesday) and contains only the tweet IDs that contain a URL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
File contains a list of Twitter account IDs in ASCII format. These accounts were those which we sampled and then analysed in the paper. The data we used are available from Twitter with the REST API.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a new approach to measuring political polarization, including a novel algorithm and open source Python code, which leverages Twitter content to produce measures of polarization for both users and hashtags. #Polar scores provide advantages over existing measures because they (1) can be calculated throughout the legislative cycle, (2) allow for easy differentiation between users with similar scores, (3) are chamber-agnostic, and (4) are a generic approach that can be applied beyond the U.S. Congress. #Polar scores leverage available information such as party labels, word frequency, and hashtags to create an accessible, straightforward algorithm for estimating polarity using text. (from the paper: Hemphill, L., Culotta, A., and Heston, M. (forthcoming) #Polar Scores: Measuring partisanship using social media content. Journal of Information Technology & Politics.)The dataset contains one plain text TSV file with the following information for each of the 55,244 tweets used to develop #Polar scores : tweet_id, created_at, user_id, screen_name, tag, shortid, sex, party, state, chamber, name. The file contains one row per hashtag, and therefore tweets may appear more than once. The Python code for calculating #Polar scores is available here: http://doi.org/10.5281/zenodo.53888
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a detailed collection of tweets from Joe Biden's official Twitter account, @JoeBiden. It covers a significant period from 24th October, 2007, up to 31st October, 2020. The purpose of this dataset is to offer an invaluable resource for researchers, analysts, and anyone interested in tracking political communication, social media trends, and public sentiment over time. It offers direct insights into his public statements and engagements during his tenure as Vice President and his presidential campaign.
The dataset is typically provided in a CSV file format. It comprises 6,064 unique records, each representing a single tweet from Joe Biden's Twitter handle. The data spans from late 2007 to late 2020.
This dataset is ideal for various applications and use cases, including: * Political Analysis: Studying the evolution of Joe Biden's political messaging and public discourse. * Social Media Trend Analysis: Identifying patterns in social media engagement and public response to political figures. * Natural Language Processing (NLP): Training models for sentiment analysis, topic modelling, and text classification on political language. * Historical Research: Providing a digital archive for historians and researchers examining contemporary political communication. * Journalism: Fact-checking and providing context for news stories related to Joe Biden's past statements.
The dataset's coverage is global, reflecting the worldwide accessibility of Twitter. The time range is precisely from 24th October, 2007, to 31st October, 2020. The content specifically focuses on tweets originating from Joe Biden's official Twitter handle, @JoeBiden, without specific demographic targeting or limitations beyond the nature of a public figure's Twitter feed.
CC0
Original Data Source: Joe Biden Tweets (2007 - 2020)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Twitter data collected during the 2018 Mexican presidential election campaign, focusing on mentions and tweets related to the main presidential candidates (@JoseAMeadeK, @RicardoAnayaC, @lopezobrador_, and @JaimeRdzNL). It represents a sample of 10,000 tweets from a larger dataset gathered as part of the research project "In-Context Learning for Misinformation Detection: Detecting Political Propaganda on Twitter Mexico using Large Language Model Meta AI".
The dataset includes the following fields: tweet_id: Unique identifier for each tweet followers_count: Number of followers of the user who posted the tweet created_at: Original timestamp of tweet creation (UTC) local_time: Timestamp converted to Mexico City time zone tweet: Text content of the tweet source: Platform or application used to post the tweet
This sample dataset was collected using the Twitter's streaming API in 2018. The script filtered the global Twitter stream for mentions of Mexico's presidential candidates.
Several fields present in the original data collection have been removed from this sample to comply with Twitter's terms of service and to protect user privacy:
Username (screen_name) Tweet URL Geographical coordinates User location information
Only publicly accessible tweets (those without privacy restrictions set by users) were collected in the original dataset.
This dataset serves as a sample to provide insights into the larger research project focusing on misinformation detection and political propaganda analysis in Mexican social media during the 2018 presidential campaign. The research applies large language models to detect patterns of misinformation in political discourse.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data-set contain dynamics of Twitter activity on political parties. Please see the main paper "Twitter-based analysis of the dynamics of collective attention to political parties (PLoS ONE, 2015)" for the details.
copy directly from abstract in PSRM publication. Visit https://dataone.org/datasets/sha256%3Ab55eab8750f6f4b758b0db7bbe0297a5da783ad5f9d3d6c11e4670ee355d4ce7 for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Anonymised raw data downloaded from Twitter. All twitter ids have been encrypted so will not work.The .tgz files each contain a .json file containing user account data. In the files there is a json line for each account, which includes the id, its creation date, a list of ids of accounts followed (followids), and a list of ids for accounts which follow that account (friendids).The file groupMembers.json has a json line for each group of accounts found. This includes the id for the group, and a list of the ids of its members.The file groupDescriptions.json has a json line for each group of accounts found. This includes the id for the group and a list of the unusual words found for each group.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 7,015,186 tweets from 951,602 users, extracted using 91 search terms over 36 days between August 1st and December 31st, 2022.
All tweets in this dataset are in Brazilian Portuguese.
The dataset contains textual data from tweets, making it suitable for various NLP analyses, such as sentiment analysis, bias or stance detection, and toxic language detection. Additionally, users and tweets can be linked to create social graphs, enabling Social Network Analysis (SNA) to study polarization, communities, and other social dynamics.
This data set was extracted using Twitter's (now X) official API—when Academic Research API access was still available—following the pipeline:
1. Twitter/X daily monitoring: The dataset author monitored daily political events appearing in Brazil's Trending Topics. Twitter/X has an automated system for classifying trending terms. When a term was identified as political, it was stored along with its date for later use as a search query.
2. Tweet collection using saved search terms: Once terms and their corresponding dates were recorded, tweets were extracted from 12:00 AM to 11:59 PM on the day the term entered the Trending Topics. A language filter was applied to select only tweets in Portuguese. The extraction was performed using the official Twitter/X API.
3. Data storage: The extracted data was organized by day and search term. If the same search term appeared in Trending Topics on consecutive days, a separate file was stored for each respective day.
For more details, visit:
- The repository
- Dataset short paper:
---
DOI: 10.5281/zenodo.14834669
These are the data related to the PLOS ONE paper : Gaumont N, Panahi M, Chavalarias D (2018) Reconstruction of the socio-semantic dynamics of political activist Twitter networks—Method and application to the 2017 French presidential election. PLoS ONE 13(9): e0201879. https://doi.org/10.1371/journal.pone.0201879 This paper proposes an integrated methodology for the data collection, the reconstruction and the visualization of the development of a country political environment from Twitter data. These data cover several aspects of the analysis of the 2017 French presidential campaign election from the perspective of Twitter processing of the Twitter data: intermediary results processed on the tweets dataset (for example text-mining results), additional data from the candidates' programs. Additional information are given in the Supporting information texts.
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Political Tweets dataset to enhance campaign strategies and gain insights into public discourse. This dataset offers a comprehensive view of political dynamics on social media, empowering organizations, researchers, and policymakers to analyze trends and sentiment. Access the full dataset or customize it with specific data points tailored to your needs. Popular use cases include: Sentiment Analysis: Analyze publicly available political tweets to understand public sentiment on policies, events, and candidates, aiding campaign strategies and opinion research. Trend Monitoring: Track trending topics and hashtags in political discourse to identify key issues and shifts in public priorities across demographics. Misinformation Detection: Detect and analyze patterns of misinformation, supporting efforts to combat its spread effectively. Harness these insights to stay informed and adapt to the evolving political landscape.
This repository contains the replication material of the article "Most users do not engage with political elites on Twitter; Those who do, show overwhelming preferences for ideological congruity", to be published at Sience Advances, by Magdalena Wojcieszak, Andreu Casas, Xudong Yu, Jonathan Nagler, and Joshua Tucker. One of the datasets is too large for Dataverse (ingroup-sharing-model-data.csv), you can find a copy in this Google Drive: https://drive.google.com/drive/folders/1EYqaSF-EukTGhanogqaevSnjn7l0koEH?usp=sharing. You can also clone and use this code/data via GitHub: https://github.com/CasAndreu/ingroup_filtering
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Statistics say that Twitter is the preferred social network by journalists in Europe. This means that it provides a perfect environment for the study on debates about Society, Economics, and Politics. The dataset depicting the retweeting activity on an Italian debate regarding immigration policies over a period of one month (from January 2019, 23rd to February 2019, 22nd). The dataset is labeled according to the boticity score of the users participating in the discussion, as outcome of Botometer, a popular bot detector. All the accounts have been classified either as human-operated or as bots. Due to Twitter developers terms we can only provide ids for users and tweets, that can be used to retrieve the original data through the Twitter API. For additional details please refer to "Twitter data on political debates about the Italian immigration policies". (currently under submission to CIKM 2020 resource papers)
Data with codes. Visit https://dataone.org/datasets/sha256%3A6f480cbcc96c839ab0850f72b2233b504d64bdb0537e68afb8dec5a564504995 for complete metadata about this dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset contains Twitter traffic related to the 2020 US pre-election debate. The data was collected considering two types of states, namely swing and safe. The term 'swing' refers to those states in which one cannot be certain of a landslide victory for either Republicans or Democrats, as there is no clear historical orientation of the electorate. In contrast to swing, a state is defined as 'safe' if its citizens have traditionally always elected representatives of the same political party. In particular, the tweets present are from four swing states (i.e., Arizona, Florida, Michigan e Pennsylvania) and four safe states (i.e., New Jersey, Indiana, Washington and Louisiana). The dataset was used in the study "Swinging in the States: Does disinformation on Twitter mirror the US presidential election system?". (under submission).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://techcrunch.com/wp-content/uploads/2015/10/twitter-politics.png" alt="">
Social media is becoming a key medium through which we communicate with each other: it is at the center of the very structures of our daily interactions. Yet this infiltration is not unique to interpersonal relations. Political leaders, governments, and states operate within this social media environment, wherein they continually address crises and institute damage control through platforms such as Twitter.
With the proliferation of the internet into mass masses, social media is emerging as a potential way of communication. It provides a direct channel to politicians for communicating, connecting, and engaging with the public. The power of social media, especially Twitter and Facebook has been proved by its successful application during recent US presidential elections and Arabian countries' revolts. In India too, as the general election is about to knock at the door during early 2014, political parties and leaders are trying to harness the power of social media.
The tweets have the #Politics hashtag. The collection started on 24/7/2021, and will be updated on a daily basis.
The data totally consists of 1 lakh+ records with 13 columns. The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #Politics | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
You can use this data to dive into the subjects that use this hashtag, look to the geographical distribution, evaluate sentiments, and look at trends.