Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset based on Twitter usernames of American politicians. Data extracted from Wikidata.
The same politician can appear several times: if he has different pseudonyms on Twitter or Instagram, if he has been in several parties, or if several Twitter account IDs are associated with him. But the data is sorted in ascending order by name, so it is visible.
Facebook
Twitterhttps://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F9ff49a3bb052e339eb85a66dca611f6c%2Fcharlie-kirk-turning-point2-91025-91025-a19b6183557949938f0dc01df2c33a28.jpg?generation=1757731111497297&alt=media" alt="">
Charles James Kirk (October 14, 1993 – September 10, 2025) was an American conservative political activist, author, and media personality. He co-founded the organization Turning Point USA (TPUSA) in 2012 and was its executive director. He was the chief executive officer of Turning Point Action (TPAction) and a member of the Council for National Policy (CNP). In his later years, he was one of the most prominent voices of the populist MAGA movement and exemplified the growth of Christian nationalism in the Republican Party.
From: https://en.wikipedia.org/wiki/Charlie_Kirk
https://www.youtube.com/watch?v=0xngCgJnO5E" alt="">
On September 10, 2025, while on stage at Utah Valley University in Orem, Utah, for a TPUSA event, "The American Comeback Tour", Kirk was fatally shot in the neck. The shooting took place at 12:23 p.m. MDT (18:23 UTC), around 20 minutes after the event began, in front of an audience of about 3,000 people.
From: https://en.wikipedia.org/wiki/Charlie_Kirk
I added a file to denote users who have posted tweets about the topic that have either characteristic(s) - Blue-certified accounts with at least 10K followers - Non-Blue-certified accounts with at least 50K followers
This is to help map back and include additional context on who these users that are being tagged or are creating the tweets
I signed up for a trial with https://twitterapi.io/ , check it out!
Credit : OLIVIER TOURON/ AFP via Getty
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises of 10 JSON files, each containing geographic metadata and a sentiment score collected from tweets between March 20, 2020 and December 1, 2020 pertaining to the COVID-19 global pandemic for ten of the most populous cities in the United States and Canada.
Facebook
Twitterall_musk_posts.csv - Elon Musk's tweets from his official account (@elonmusk) from the very beginning till April 13, 2025. musk_quote_tweets.csv - the original tweets that Elon Musk quote-tweeted to his official account (@elonmusk) from the very beginning till April 13, 2025. I scraped Elon Musk's tweets and combined it with other datasets published on Kaggle in different years: - All Elon Musk's Tweets - tweets from Bill Gates, Elon Musk and Ed Lee - Elon Musk Tweets, 2010 to 2017 - Elon Musk Tweets (2021-2023)
The business magnate Elon Musk initiated an acquisition of the American social media company Twitter, Inc. on April 14, 2022, and concluded it on October 27, 2022. Musk had begun buying shares of the company in January 2022, becoming its largest shareholder by April with a 9.1 percent ownership stake. (Wikipedia)
By early 2024, Musk had become a vocal and financial supporter of Donald Trump. (Washington Post)
The data was collected and combined for the publication Poster boy: Six instances of Kremlin disinformation amplified through Elon Musk’s social network (The Insider, 2025-03-12). Below are two visualisations based of this data.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4728018%2F698e901a8dec9a84d7d5d5799427da42%2Ffile-efae4a0f8b8c46becfa2a845a8b6ac17.jpg?generation=1742660891320296&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4728018%2Ff98f3e2b42d9201cee3a37d1d3b5fa24%2Ffile-e0d66dd19064cc2c1ff91eeb34ee8157.jpg?generation=1742660914839024&alt=media" alt="">
Content of the dataset all_musk_posts.csv:
id - ID of the tweet by elonmusk
url - link to a tweet (on x.com)
twitterUrl - link to a tweet (on twitter.com)
fullText - text of the tweet
retweetCount - number of retweets
replyCount - number of replies
likeCount - number of likes
quoteCount - number of quotes
viewCount - number of views
createdAt - timestamp, UTC
bookmarkCount - number of bookmarks
isReply - boolean, True if the post is a reply
inReplyToId - ID of the original tweet if that's a reply
conversationId - conversation ID
inReplyToUserId - ID of the user that received a reply
inReplyToUsername - current username of the user that received a reply
isPinned - boolean, True if the post was pinned
isRetweet - boolean, True if the post is a retweet
isQuote - boolean, True if the post is a quote
isConversationControlled - conversation marked as "controlled", only selected users can reply
possiblySensitive - conversation marked as "sensitive"
Content of the dataset musk_quote_tweets.csv:
orig_tweet_id - ID of the original tweet by that @elonmusk quote-tweeted
orig_tweet_created_at - timestamp of the original tweet, UTC
orig_tweet_text - text of the original tweet, UTC
orig_tweet_url - link to the original tweet (on x.com)
orig_tweet_twitter_url - link to the original tweet (on twitter.com)
orig_tweet_username - current (March 2025) username of the account that posted the original tweet
orig_tweet_retweet_count - number of retweets for the original tweet
orig_tweet_reply_count - number of replies for the original tweet
orig_tweet_like_count - number of likes for the original tweet
orig_tweet_quote_count - number of quotes for the original tweet
orig_tweet_view_count - number of views for the original tweet
orig_tweet_bookmark_count - number of bookmarks for the original tweet
musk_tweet_id - ID of the quote-tweet by elonmusk
musk_quote_tweet - text of the quote-tweet by elonmusk
musk_quote_retweet_count - number of retweets for the quote-tweet by elonmusk
musk_quote_reply_count - number of replies for the quote-tweet by elonmusk
musk_quote_like_count- number of likes for the quote-tweet by elonmusk
musk_quote_quote_count- number of quotes for the quote-tweet by elonmusk
musk_quote_view_count - number of views for the quote-tweet by elonmusk
musk_quote_bookmark_count - number of bookmarks for the quote-tweet by elonmusk
musk_quote_created_at - timestamp of the quote-tweet by elonmusk, UTC
Acknowledgements
I do not own this data however I scraped this data for educational purposes ONLY. Please do not violate any...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset enumerates the number of geocoded tweets captured in geographic rectangular bounding boxes around the metropolitan statistical areas (MSAs) defined for 49 American cities, during a four-week period in 2012 (between April and June), through the Twitter Streaming API. More information on MSA definitions: https://www.census.gov/population/metro/
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
High Frequency Indicator: This dataset compiles year- and month-wise data from 2021 to the present on the number and distribution of user grievances received by X (formerly Twitter), along with the actions taken by the platform. The data is based on the monthly transparency reports published under Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021.
From June 2021 to August 2025, X reported grievance data in absolute numbers across various categories such as illegal activities, IP-related infringements, Abuse/Harassment, Child Sexual Exploitation, Defamation, Hateful Conduct, Impersonation, Misinformation, etc. During this period, the dataset reflects these absolute values directly.
Beginning September 2025, X discontinued reporting grievance counts by category in absolute numbers and instead published percentage share distributions of grievances and URLs actioned. The transparency reports continued to provide the total number of grievances received and total URLs actioned, from which the dataset estimates the absolute category-wise values by applying the reported percentage shares. Additionally, for comparability across the entire time series, percentage shares for months prior to September 2025 have been computed based on the reported absolute values.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset comprises all tweets that contained the hashtag #ALittleLife or the official account @alittlelifebook and were posted in March or April 2015. The dataset was generated in February 2023 using snscrape (https://github.com/JustAnotherArchivist/snscrape/).
The dataset is discussed in my article, "What We Can('t) Know Before We Read: Towards a Theory of the Pre-Reading Environment", Book History 27.2 (2024).
Facebook
TwitterA list of 10,538 Twitter IDs for tweets harvested between 4 January at 11am and 9 January at 11am using Social Feed Manager. As this used the search API, the 4 January at 11am crawl went back about 5-9 days. Tweet IDs included, as is a log of the decisions made to curate this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
120 words most characteristic of American tweets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
201
Facebook
Twitterhttp://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html
This dataset contains 2,962 tweets scraped from Twitter between 14 October and 23 October 2022. The tweets focus on communications around Native American/American Indian/Indigenous Americans. Three search terms were used in the creation of this dataset: - American Indian - Native American - Indigenous American
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In this dataset you will find hundreds of thousands of tweets from current and past US senators during their tenure.
This dataset comes from https://data.world/fivethirtyeight/twitter-ratio.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Twitter timeline of most prominent American politicians. We expect these Tweets to be updated as the time permits. Hopefully they can be added daily.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Appendix S1-S3, Table S1 and Software S1. Appendix S1. Term list. List of all words considered in our main analysis. Appendix S2. Term examples. Examples for each term considered in our analysis. Appendix S3. Data Procedures. Description of the procedures used for data processing, including Twitter data acquisition, geocoding, content filtering, word filtering, and text processing. Table S1. Term annotations. Tab-separated file describing annotations of each term as entities, foreign-language, or acceptable for analysis. Software S1. Preprocessing software. Source code for data preprocessing. (ZIP)
Facebook
TwitterThis dataset, divided into files by city, contains geotagged digital traces collected from different social media platforms, detailed below. • Tweets - Cheng et al. [1] • Gowalla [2] • Tweets - Lamsal [3] • YELP[4] • Tweets - Kejriwal et al. [5] • Geotagged Tweets [6] • UrbanActivity, [7] • Brightkite [8] • Weeplaces [8] • Flickr [9] • Foursquare [10] Each file is named according to the city to which the digital traces were associated and contains the columns: Source: contains the name of the source platform Event_date: contains the date associated with the digital trace Lat: latitude of the digital trace Lng: length of the digital trace The definition of city/town used is provided by Simplemaps [11], which considers a city/town any inhabited place as determined by U.S. government agencies. The location of cities and their respective centers were obtained from the World Cities Database provided by the same company. A specific group of these cities was utilized for the research presented in the article submitted to Sensors Journal: Muñoz-Cancino, R., Rios, S. A., & Graña, M. (2023). Clustering cities over features extracted from multiple virtual sensors measuring micro-level activity patterns allows to discriminate large-scale city characteristics. Sensors, Under Review. Comprehensive guidelines and the selection criteria can be found in the abovementioned article. References [1] Zhiyuan Cheng, James Caverlee, and Kyumin Lee. You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM '10, page 759{768, New York, NY, USA, 2010. Association for Computing Machinery.
[2] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, page 1082{1090, New York, NY, USA, 2011. Association for Computing Machinery.
[3] Yunhe Feng and Wenjun Zhou. Is working from home the new norm? an observational study based on a large geo-tagged covid-19 twitter dataset, 2020.
[4] Yelp Inc. Yelp Open Dataset, 2021. Retrieved from https://www.yelp.com/dataset. Accessed October 26, 2021.
[5] Mayank Kejriwal and Sara Melotte. A Geo-Tagged COVID-19 Twitter Dataset for 10 North American Metropolitan Areas, January 2021.
[6] Rabindra Lamsal. Design and analysis of a large-scale covid-19 tweets dataset. Applied Intelligence, 51(5):2790{2804, 2021.
[7] Geraud Le Falher, Aristides Gionis, and Michael Mathioudakis. Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities. In 9th AAAI Conference on Web and Social Media - ICWSM 2015, Oxford, United Kingdom, May 2015.
[8] Yong Liu, WeiWei, Aixin Sun, and Chunyan Miao. Exploiting geographical neighborhood characteristics for location recommendation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM '14, page 739{748, New York, NY,USA, 2014. Association for Computing Machinery.
[9] Hatem Mousselly-Sergieh, Daniel Watzinger, Bastian Huber, Mario Doller, Elood Egyed-Zsigmond, and Harald Kosch. World-wide scale geotagged image dataset for automatic image annotation and reverse geotagging. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys '14, page 47{52, New York, NY, USA, 2014. Association for Computing Machinery.
[10] Dingqi Yang, Daqing Zhang, Vincent W. Zheng, and Zhiyong Yu. Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(1):129{142, 2015.
[11] Simple Maps. Basic World Cities Database, 2021. Retrieved from https://simplemaps.com/data/world-cities. Accessed September 3, 2021.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Top 20 winner tokens in the longitudinal sample.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Elon Musk is an American business magnate. He was one of the founders of PayPal in the past, and the founder and/or co-founder Elon Musk is an American business magnate. He was one of the founders of PayPal in the past, and the founder and CEO of SpaceX, Tesla, SolarCity, OpenAI, Neuralink, and The Boring Company in the present. He is known as much for his extreme forward-thinking ideas and huge media presence as he is for his extremely business savvy.
Musk is famously active on Twitter. This dataset contains all tweets made by @elonmusk, his official Twitter handle.
Can you figure out Elon Musk's opinions on various things by studying his Twitter statements? How did Elon Musk's post rate increase, decrease, or stayed about the same over time?
This dataset has the following features; - Date Created - Number of Likes - Source of Tweet - Tweets
Facebook
TwitterThe global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset based on Twitter usernames of American politicians. Data extracted from Wikidata.
The same politician can appear several times: if he has different pseudonyms on Twitter or Instagram, if he has been in several parties, or if several Twitter account IDs are associated with him. But the data is sorted in ascending order by name, so it is visible.