Facebook
TwitterDay-by-Day Data on Donald Trump Tweets till 8th Jan 2021, before blocking of his twitter account. The data contains information on number of retweets, deletion of tweets, device through which tweeted, flagged tweets, favorite tweets, etc.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
[READ THIS FIRST! DATASETS FOR Academic/Learning/Non-commercial purpose]
US Election 2020 is very interesting to look into as it is an election in the middle of a pandemic. Me and my teammate created a twitter crawler using Twitter API and Tweepy for my Artificial Intelligence coursework. We chose Donald Trump as a subject of interest as President Trump was known for his twitter interaction.
I decided to deploy my crawler on post-voting day to conduct a sentiment analysis.
Tweet text in this datasets is suitable for Sentiment Analysis usage.
This raw datasets is crawled using Tweepy library and Twitter API. 2500 tweets were gathered per 15 minutes. There are total of 247,500 row of entries and 13 columns, with the total of 3,217,500 cells of data. Data cleaning is needed to perform before doing any analysis.
Datasets date range: 4th November 2020 - 11th November 2020 Tweets with "Trump", "DonalTrump", "realDonalTrump" were capture.
(The User = user of the particular row) username: Twitter User handle accDesc: Description of the user on profile location: Location of the tweet following: Total number of account the user is following followers: Total number of followers of the user totaltweets: Total tweets created of the user usercreated: Date of the user registered his/her Twitter account tweetcreated: Date of the tweet created favouritecount: tweet <3 count (equivalent to like on Facebook) retweetcount: Total tweet's retweet (equivalent to share on Facebook) text: Text body of the tweet tweetsource: Device used to create this tweet hashtags: hashtag of the tweet in JSON format
Banner and thumbnail courtesy of > visuals < from unsplash.com
Much thanks to my teammate Jiacheng Loh and ChenZhen Li for the efforts.
Please do not use this datasets for any malicious attempts, any damage done is not under the responsible of me.
This datasets were gathered for the purpose of learning and not for commercial purposes.
Data were public in the public domain, therefore i assume these data is open for all.
Datasets are gathered with at least 15 minutes interval, therefore datecreated distribution is not equal and may not include all tweets created within the date range.
Facebook
TwitterThis data contains all of Trump's non-deleted, and non-retweeted Tweets from the day he announced his candidacy for President in 2015 until September 27th, 2018. I utilized this data to create models to predict the number of favorites any given tweet would get based on the content of the messages and information concerning his twitter account.
This tweet level was collected from www.trumptwitterarchive.com, and the account level data from www.trackalytics.com.
I wanted these models to predict the number of favorites a tweet got without already knowing how many retweets it got. I managed to produce a model that had a mean absolute error of around 19,500 using NLP techniques and some general knowledge of Trump's behavior. I would love to see others beat my models and create amazing predictors themselves.
Facebook
TwitterAs of October 2025, social network X (formerly known as Twitter) was most popular in the United States, with an audience reach of approximately 99.04 million users. Japan ranked second, recording more than 71 million users on the platform. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
February 9, 2021 marks the first day of the Second Impeachment Trial of former President Donald J. Trump. This repository features a dataset of 100,000 tweets taken on the morning of Day 1 of the Impeachment trial in four different forms:
1. Raw Text Data
2. Uncleaned CSV Data
3. Cleaned CSV Data
4. Cleaned and Sentiment Tagged CSV Data
This data can be used for Data Visualization, NLP sentiment analysis, and Political/Policy Analysis as well as many other use cases that I haven't yet considered.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
(A) Total number of tweets and (B) proportion of tweets made by each of the 10 Granger linked accounts from 01/01/2015 to 02/26/2017. The vertical lines represent significant events in the election cycle. From left to right: Trump announces candidacy (black, 06/16/2015), Trump calls for a ban on the immigration of Muslims after San Bernardino shooting (red, 12/07/2015), Trumps declared Republican nominee (red, 06/19/2016), Hillary declared Democratic nominee (blue, 07/28/2016), 3 presidential primary debates (green, 09/26, 10/09, 10/19/2016), election day (black, 11/08/2016), and Trump’s inauguration (black, 01/20/2017).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
On Jan 20th, 2017, Donald J. Trump was elected as the 45th President of the United States. This marked the end of a brutal and contentious campaign. He goes in as one of the most unpopular presidents in modern history(based on the popular vote).
Trump's election to the presidency led to the organization of the Women's March , where millions of men and women took to the streets to protest the new government's stance on women's rights and healthcare. Social media blew up with searchable terms like "#WomensMarch" prompting major news organizations to cover the mass protests.
The data was acquired using the twitteR package's searchTwitter() function. This function makes a call to the Twitter API. A total of 30000 tweets containing #Inauguration and #WomensMarch were obtained (15000 for each).
1 "X" : Serial Number
2 "text" : Tweet Text
3 "favorited" : TRUE/FALSE
4 "favoriteCount" : Number of Likes
5 "replyToSN" : Screen Handle name of the receiver
6 "created" : YYYY-MM-DD H:M:S
7 "truncated" : If the Tweet is Truncated (TRUE/FALSE)
8 "replyToSID": ID of the receiver
9 "id" : ID
10 "replyToUID": User ID of the receiver
11 "statusSource": Device Information (Web Client,IPhone,Android etc)
12 "screenName" : Screen name of the Tweeter
13 "retweetCount": Number of Retweets
14 "isRetweet" : TRUE/FALSE
15 "retweeted" : Has this tweet been retweeted(TRUE/FALSE)
16 "longitude" : longitude
17 "latitude" : latitude
How do the polarity/number of tweets change by time? Which locations had negative sentiments about the Inauguration? What about the Women's March? How to the retweet and mention networks look like for each case? Number of Tweets per Day? Which day has the most activity? What are the other hashtags used?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the beginning of the launch, they had some pretty fast growth. Here are the key Truth Social statistics you need to know.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Female genital mutilation/cutting (FGM/C) describes several procedures that involve injury to the vulva or vagina for nontherapeutic reasons. Though at least 200 million women and girls living in 30 countries have undergone FGM/C, there is a paucity of studies focused on public perception of FGM/C. We used machine learning methods to characterize discussion of FGM/C on Twitter in English from 2015 to 2020. Twitter has emerged in recent years as a source for seeking and sharing health information and misinformation. We extracted text metadata from user profiles to characterize the individuals and locations involved in conversations about FGM/C. We extracted major discussion themes from posts using correlated topic modeling. Finally, we extracted features from posts and applied random forest models to predict user engagement. The volume of tweets addressing FGM/C remained fairly stable across years. Conversation was mostly concentrated among the United States and United Kingdom through 2017, but shifted to Nigeria and Kenya in 2020. Some of the discussion topics associated with FGM/C across years included Islam, International Day of Zero Tolerance, current news stories, education, activism, male circumcision, human rights, and feminism. Tweet length and follower count were consistently strong predictors of engagement. Our findings suggest that (1) discussion about FGM/C has not evolved significantly over time, (2) the majority of the conversation about FGM/C on English-speaking Twitter is advocating for an end to the practice, (3) supporters of Donald Trump make up a substantial voice in the conversation about FGM/C, and (4) understanding the nuances in how people across cultures refer to and discuss FGM/C could be important for the design of public health communication and intervention.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A survey done in March 2022 found that 31% of Republican voters said they would use Truth Social often and 14% said they plan to use the platform a lot.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
You might be surprised how much Truth Social is worth based on its small number of users.
Facebook
TwitterThis is the dataset I used to figure out which sociodemographic factor including the current pandemic status of each state has the most significan impace on the result of the US Presidential election last year. I also included sentiment scores of tweets created from 2020-10-15 to 2020-11-02 as well, in order to figure out the effect of positive/negative emotion for each candidate - Donald Trump and Joe Biden - on the result of the election.
Details for each variable are as below: - state: name of each state in the United States, including District of Columbia - elec16, elec20: dummy variable indicating whether Trump gained the electoral votes of each state or not. If the electors casted their votes for Trump, the value is 1; otherwise the value is 0 - elecchange: dummy variable indicating whether each party flipped the result in 2020 compared to that of the 2016 - demvote16: the rate of votes that the Democrats, i.e. Hillary Clinton earned in the 2016 Presidential election - repvote16: the rate of votes that the Republicans , i.e. Donald Trump earned in the 2016 Presidential election - demvote20: the rate of votes that the Democrats, i.e. Joe Biden earned in the 2020 Presidential election - repvote20: the rate of votes that the Republicans , i.e. Donald Trump earned in the 2020 Presidential election - demvotedif: the difference between demvote20 and demvote16 - repvotedif: the difference between repvote20 and repvote16 - pop: the population of each state - cumulcases: the cumulative COVID-19 cases on the Election day - caseMar ~ caseOct: the cumulative COVID-19 cases during each month - Marper10k ~ Octper10k: the cumulative COVID-19 cases during each month per 10 thousands - unemp20: the unemployment rate of each state this year before the election - unempdif: the difference between the unemployment rate of the last year and that of this year - jan20unemp ~ oct20unemp: the unemployment rate of each month - cumulper10k: the cumulative COVID-19 cases on the Election day per 10 thousands - b_str_poscount_total: the total number of positive tweets on Biden measured by the SentiStrength - b_str_negcount_total: the total number of negative tweets on Biden measured by the SentiStrength - t_str_poscount_total: the total number of positive tweets on Trump measured by the SentiStrength - t_str_poscount_total: the total number of negative tweets on Trump measured by the SentiStrength - b_str_posprop_total: the proportion of positive tweets on Biden measured by the SentiStrength - b_str_negprop_total: the proportion of negative tweets on Biden measured by the SentiStrength - t_str_posprop_total: the proportion of positive tweets on Trump measured by the SentiStrength - t_str_negprop_total: the proportion of negative tweets on Trump measured by the SentiStrength - white: the proportion of white people - colored: the proportion of colored people - secondary: the proportion of people who has attained the secondary education - tertiary: the proportion of people who has attained the tertiary education - q3gdp20: GDP of the 3rd quarter 2020 - q3gdprate: the growth rate of the 3rd quarter 2020, compared to that of the same quarter last year - 3qsgdp20: GDP of 3 quarters 2020 - 3qsrate20: the growth rate of GDP compared to that of the 3 quarters last year - q3gdpdif: the difference in the level of GDP of the 3rd quarter compared to the last quarter - q3rate: the growth rate of the 3rd quarter compared to the last quarter - access: the proportion of households having the Internet access
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We've put together a list of the latest Truth Social statistics so you can see who uses the platform and whether or not Truth Social is likely to become a dominant social media network in the future.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
How does Truth Social compare to other social media platforms? There are around 2 million active Truth Social users.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Charlottesville is home to a statue of Robert E. Lee which is slated to be removed. (For those unfamiliar with American history, Robert E. Lee was a US Army general who defected to the Confederacy during the American Civil War and was considered to be one of their best military leaders.) While many Americans support the move, believing the main purpose of the Confederacy was to defend the institution of slavery, many others do not share this view. Furthermore, believing Confederate symbols to be merely an expression of Southern pride, many have not taken its planned removal lightly.
As a result, many people--including white nationalists and neo-Nazis--have descended to Charlottesville to protest its removal. This in turn attracted many counter-protestors. Tragically, one of the counter-protestors--Heather Heyer--was killed and many others injured after a man intentionally rammed his car into them. In response, President Trump blamed "both sides" for the chaos in Charlottesville, leading many Americans to denounce him for what they see as a soft-handed approach to what some have called an act of "domestic terrorism."
This dataset below captures the discussion--and copious amounts of anger--revolving around this past week's events.
This data set consists of a random sample of 50,000 tweets per day (in accordance with the Twitter Developer Agreement) of tweets mentioning Charlottesville or containing "#charlottesville" extracted via the Twitter Streaming API, starting on August 15. The files were copied from a large Postgres database containing--currently--over 2 million tweets. Finally, a table of tweet counts per timestamp was created using the whole database (not just the Kaggle sample). The data description PDF provides a full summary of the attributes found in the CSV files.
Note: While the tweet timestamps are in UTC, the cutoffs were based on Eastern Standard Time, so the August 16 file will have timestamps ranging from 2017-08-16 4:00:00 UTC to 2017-08-17 4:00:00 UTC.
The dataset is available as either separate CSV files or a single SQLite database.
I'm releasing the dataset under the CC BY-SA 4.0 license. Furthermore, because this data was extracted via the Twitter Streaming API, its use must abide by the Twitter Developer Agreement. Most notably, the display of individual tweets should satisfy these requirements. More information can be found in the data description file, or on Twitter's website.
Obviously, I would like to thank Twitter for providing a fast and reliable streaming service. I'd also like to thank the developers of the Python programming language, psycopg2, and Postgres for creating amazing software with which this data set would not exist.
The banner above is a personal modification of these images:
I almost removed the header "inspiration" from this section, because this is a rather sad and dark data set. However, this is preciously why this is an important data set to analyze. Good history books have never shied away from unpleasant events, and never should we.
This data set provides a rich opportunity for many types of research, including:
Furthermore, given the political nature of this dataset, there are a lot of social science questions that can potentially be answered, or at least piqued, by this data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterDay-by-Day Data on Donald Trump Tweets till 8th Jan 2021, before blocking of his twitter account. The data contains information on number of retweets, deletion of tweets, device through which tweeted, flagged tweets, favorite tweets, etc.