24 datasets found

Verified NFT Tweets
kaggle.com
zip
Updated Apr 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
adarsh (2022). Verified NFT Tweets [Dataset]. https://www.kaggle.com/datasets/adanai/verified-nft-tweets
Explore at:
zip(12309951 bytes)Available download formats
Dataset updated
Apr 11, 2022
Authors
adarsh
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Non-Fungible Tokens (NFTs) are a relatively new concept and have been making headlines for the related events happening in the space.

The best way to gauge the sentiments and get basic level stats is to use data from social media. Twitter is a powerful platform for people to express their opinions on any given topic. The tweets which include hashtags(#) related to NFTs are collected.

This dataset can possibly help to capture the trend of NFTs by using available data and answerquestions that help understand how far NFTs have come.
Sentiment Analysis on Financial Tweets
kaggle.com
zip
Updated Sep 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
Explore at:
zip(2538259 bytes)Available download formats
Dataset updated
Sep 5, 2019
Authors
Vivek Rathi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

Content

This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

Acknowledgements

The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

Inspiration

I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
🇺🇸 Charlie Kirk(†) Twitter/ 𝕏 Dataset
kaggle.com
Updated Sep 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2025). 🇺🇸 Charlie Kirk(†) Twitter/ 𝕏 Dataset [Dataset]. http://doi.org/10.34740/kaggle/ds/8259158
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/8259158
Dataset updated
Sep 28, 2025
Dataset provided by
Kaggle
Authors
BwandoWando
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Who is Charlie Kirk?

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F9ff49a3bb052e339eb85a66dca611f6c%2Fcharlie-kirk-turning-point2-91025-91025-a19b6183557949938f0dc01df2c33a28.jpg?generation=1757731111497297&alt=media" alt="">

Charles James Kirk (October 14, 1993 – September 10, 2025) was an American conservative political activist, author, and media personality. He co-founded the organization Turning Point USA (TPUSA) in 2012 and was its executive director. He was the chief executive officer of Turning Point Action (TPAction) and a member of the Council for National Policy (CNP). In his later years, he was one of the most prominent voices of the populist MAGA movement and exemplified the growth of Christian nationalism in the Republican Party.

From: https://en.wikipedia.org/wiki/Charlie_Kirk

CBS News' "Who was Charlie Kirk?"

https://www.youtube.com/watch?v=0xngCgJnO5E" alt="">

Death

On September 10, 2025, while on stage at Utah Valley University in Orem, Utah, for a TPUSA event, "The American Comeback Tour", Kirk was fatally shot in the neck. The shooting took place at 12:23 p.m. MDT (18:23 UTC), around 20 minutes after the event began, in front of an audience of about 3,000 people.

From: https://en.wikipedia.org/wiki/Charlie_Kirk

Coverage of this Dataset

I queried tweets with either #CharlieKirk or "Charlie Kirk" in them within the last 36 hours.

Important Note

All tagged usernames (ex: @username) and forms of Ids are obfuscated and replaced with a unique hashid value based on original value retaining data integrity

Tagged usernames that have been banned, suspended, or deleted from the platform are still obfuscated

"Well-known" authors

I added a file to denote users who have posted tweets about the topic that have either characteristic(s) - Blue-certified accounts with at least 10K followers - Non-Blue-certified accounts with at least 50K followers

This is to help map back and include additional context on who these users that are being tagged or are creating the tweets

Source

I signed up for a trial with https://twitterapi.io/ , check it out!

Image

Credit : OLIVIER TOURON/ AFP via Getty
H
Data from: DISMISS: Database of Indian Social Media Influencers on Twitter
dataverse.harvard.edu
dataone.org
Updated Apr 4, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshia Arya; Soham De; Dibyendu Mishra; Gazal Shekhawat; Ankur Sharma; Anmol Panda; Faisal M Lalani; Parantak Singh; Ramaravind Kommiya Mothilal; Rynaa Grover; Sachita Nishal; Saloni Dash; Shehla Rashid Shora; Syeda Zainab Akbar; Joyojeet Pal (2022). DISMISS: Database of Indian Social Media Influencers on Twitter [Dataset]. http://doi.org/10.7910/DVN/BPY2JY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BPY2JY
Dataset updated
Apr 4, 2022
Dataset provided by
Harvard Dataverse
Authors
Arshia Arya; Soham De; Dibyendu Mishra; Gazal Shekhawat; Ankur Sharma; Anmol Panda; Faisal M Lalani; Parantak Singh; Ramaravind Kommiya Mothilal; Rynaa Grover; Sachita Nishal; Saloni Dash; Shehla Rashid Shora; Syeda Zainab Akbar; Joyojeet Pal
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Databases of highly networked individuals have been indispensable in studying narratives and influence on social media. To support studies on Twitter in India, we present a systematically categorized database of accounts of influence on Twitter in India, identified and annotated through an iterative process of friends, networks, and self-described profile information, verified manually. We built an initial set of accounts based on the friend network of a seed set of accounts based on real-world renown in various fields, and then snowballed friends of friends" multiple times, and rank ordered individuals based on the number of in-group connections, and overall followers. We then manually classified identified accounts under the categories of entertainment, sports, business, government, institutions, journalism, civil society accounts that have independent standing outside of social media, as well as a category ofdigital first" referring to accounts that derive their primary influence from online activity. Overall, we annotated 11580 unique accounts across all categories. The database is useful studying various questions related to the role of influencers in polarisation, misinformation, extreme speech, political discourse etc.
US_Congressional_Tweets_Dataset
kaggle.com
zip
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oscar Yáñez Feijóo (2024). US_Congressional_Tweets_Dataset [Dataset]. https://www.kaggle.com/datasets/oscaryezfeijo/us-congressional-tweets-dataset
Explore at:
zip(243754786 bytes)Available download formats
Dataset updated
Jan 4, 2024
Authors
Oscar Yáñez Feijóo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
United States
Description
The "US Congressional Tweets Dataset" is a comprehensive collection of tweets from US Congressional members spanning from 2008 to 2017. This dataset is valuable for organizations like Lobbyists4America, which aims to gain insights into legislative trends and influences for effective lobbying strategies. The dataset is structured into two primary components: users_df and tweets_df.

Dataset Structure:

users_df: This DataFrame provides detailed information about the Twitter accounts of various congressional members. It includes a range of attributes such as:

Account creation date (created_at), follower and friend counts (followers_count, friends_count).

Profile-related information like description, location, and verification status.

Various Twitter-specific features like contributors_enabled, default_profile, is_translator, etc.

tweets_df: This DataFrame contains the actual tweet data from these congressional accounts. Key columns include:

created_at: The timestamp of the tweet.

favorite_count and retweet_count: Indicators of the tweet's popularity.

text: The text content of the tweet.

Metadata such as user_id, lang (language), and source (device/app used for tweeting).

Other attributes like possibly_sensitive, quoted_status_id, and engagement-related fields.

Analysis Performed:

The dataset is utilized for various analyses, including:

Network Analysis: Exploring the connections and interactions between different congressional members on Twitter, potentially revealing influential figures or groups within Congress.

Sentiment Analysis: Using libraries like TextBlob and NLTK, this analysis assesses the sentiment (positive, negative, neutral) of tweets to understand the general tone and stance of congressional members on various issues.

Correlation Analysis: Investigating relationships between different numerical features in the dataset, such as whether higher tweet frequencies correlate with more followers.

Word Clustering/Topic Modeling: Utilizing NMF (Non-Negative Matrix Factorization) from scikit-learn to cluster words and identify major themes or topics discussed in the tweets.

Time Series Analysis: Observing trends and patterns in tweeting behavior over time, such as increased activity around elections or significant political events.

Python Libraries Used:

Pandas: For data manipulation and analysis.

Matplotlib: For visualizing the data.

TextBlob and NLTK: For processing textual data and performing sentiment analysis.

scikit-learn (sklearn): For machine learning tasks like NMF for topic modeling.

spaCy: An advanced natural language processing library.

NetworkX: For conducting network analysis.

ipywidgets and pytz: For creating interactive elements and handling time zones in the data, respectively.

Conclusion:

The "US Congressional Tweets Dataset" is a rich source for analyzing the digital footprint of US Congressional members. Through the application of various data science techniques, Lobbyists4America can extract meaningful insights about political sentiments, networking patterns, and topical trends among lawmakers. This information is crucial for tailoring lobbying efforts and understanding the legislative landscape.
ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related...
figshare.com
xlsx
Updated Jan 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Praatibh Surana; Mirza Yusuf; Sanjay Singh (2022). ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health [Dataset]. http://doi.org/10.6084/m9.figshare.19029656.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19029656.v2
Dataset updated
Jan 25, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Praatibh Surana; Mirza Yusuf; Sanjay Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Manipal
Description
Readme file for ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health Generated on 2021-02-15Recommended citation for the dataset:P. Surana, M. Yusuf and S. Singh, "Severity Classification of Mental Health-Related Tweets," 2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), 2021, pp. 336-341, DOI: 10.1109/DISCOVER52564.2021.9663651.******************************PROJECT INFORMATION******************************1. Title of dataset: Mental Health Dataset2. Author information:Praatibh Surana, Manipal Institute of Technology,Mirza Yusuf, Manipal Institute of Technology,Sanjay Singh, Manipal Institute of TechnologyPrincipal Investigators Name: Praatibh SuranaAddress: Manipal Institute of TechnologyEmail: praatibhsurana@gmail.comName: Mirza YusufAddress: Manipal Institute of TechnologyEmail: baig.yusuf.cr7@gmail.comCo-InvestigatorName: Sanjay SinghAddress: Manipal Institute of TechnologyEmail: sanjay.singh@manipal.edu3. Date of data collection: Jan 2021 - Feb 2021************************************DATA ACCESS INFORMATION************************************1. Licences/restrictions placed on access to the dataset: CC BY 4.02. Links to publications that use the data:URL: https://ieeexplore.ieee.org/document/9663651,DOI: 10.1109/DISCOVER52564.2021.96636513. Links to a third party or secondary data used in the project (for example, existing datasets, third-party datasets)Pennington, Jeffrey et al. “GloVe: Global Vectors for Word Representation.” EMNLP (2014).DOI: https://doi.org/10.3115/v1/d14-1162*****************************************METHODS OF DATA COLLECTION*****************************************1. Describe the methods for data collection and/or provide links to papers describing data collection methodsPaper DOI :Our research revolved around correctly classifying tweets based on their severity in a mental health context. An effort was also made to make the models detect sarcasm better, as this was something that many models in the past failed to do. Our dataset consists of tweets labeled as ‘0’, ‘1’, and '2' depending on their classes. The labeling rules followed are given in Table 1TABLE 1 - SEVERITY CLASSIFICATION CLASSES AND EXAMPLES-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Class | Rules | Example | |0 | Helping / suggestion for mental health awareness | Are you suffering from anxiety? Check out this page for therapy through Skype! | / positivity / informative | | / motivational | | / questions about mental health | | |1 | Sarcasm/rant/expression of annoyance | Today was so annoying. If my teacher would have called my name, I swear to God I would have killed myself | |2 | Case of slight disturbance | All I am is a burden. I don’t want to live anymore. | / strong indication of disturbance | | / user outright mentions depression | | / anxiety / suicide / self-harm |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------The following steps were performed for data collection:1) Tweets were extracted with the help of Twitter’s official API using hashtags such as #depression, #mentalhealth, #anxiety, #selfharm, #killmyself, and #kms from users.2) Around 40,000 tweets were extracted from Twitter between January and February 2021, out of which the final dataset comprised of 2460 tweets; 820 tweets were distributed equally amongst the three classes.3) Two authors manually annotated the dataset and cross-verified it to ensure accurate labeling.2. Data processing methods:A. Preprocessing1) Removal of numbers, URLs, usernames, and special characters: The first step after extraction of the tweets was ensuring that they were suitable for further use. The “preprocessor” uses the Python library to eliminate numbers, retweets, URLs, emojis, emoticons, and usernames, followed by duplicate tweets removal from the dataset.2) Stopword removal and expansion of standard abbreviations: We made use of Python’s “nltk” library for the removal of common stopwords such as “for,” “the,” “a,” etc. As our data is sourced from Twitter, lots of common internet abbreviations like “lol,” “kms,” “gn,”etc., were used. It was taken care of by converting these short forms to their corresponding complete forms. Lots of short forms like “wanna” for “want to” and “gonna” for “going to” were used. We ensured that these, too, were taken care of to the best of our abilities. 3) Removal of names, so that anonymity is maintained. Names of people, places, twitter handles anything that could compromise the anonymity has been removed, a token named as ‘[redacted]’ has been used in their place instead.*******************************SUMMARY OF DATA FILE*******************************Filename: MentalHealthTweets.csvShort description: This CSV File contains 2460 tweets annotated ‘0’, ‘1’ or ‘2’ based on the class they belong to.*******************************************************************DATA-SPECIFIC INFORMATION FOR NOTE: This section should be copied and pasted for each file*******************************************************************1. Number of variables: 22. Number of cases rows: 24613. Missing data codes: NA4. Variable listThe variables and their properties have been provided in Table 2TABLE 2 - VARIABLE DESCRIPTION TABLE----------------------------------------------------------------------Variable Name | Variable Description | Variable Type | |tweets | Cleaned up tweet | String | |label | Annotation for tweet | Integer----------------------------------------------------------------------
iPhone 14 Tweets [July / Sept 2022 +144k English]
kaggle.com
zip
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tleonel (2022). iPhone 14 Tweets [July / Sept 2022 +144k English] [Dataset]. https://www.kaggle.com/datasets/tleonel/iphone14-tweets
Explore at:
zip(16821184 bytes)Available download formats
Dataset updated
Sep 8, 2022
Authors
Tleonel
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
iPhone 14 📱 🐦 Tweets [11 July - Sept 9 2022 - 144k English] 📱 🐦

Updated on Sept 9th Includes sent tweets after launch

https://store.storeimages.cdn-apple.com/4668/as-images.apple.com/is/iphone-14-pro-finish-unselect-gallery-1-202209_GEO_EMEA?wid=5120&hei=2880&fmt=p-jpg&qlt=80&.v=1660754213188" alt="Photo by Apple">

Trying to do something useful and add a dataset here in Kaggle, and while there are over 90+ datasets for Elon, there's none yet for tweets about the upcoming iPhone 14. I'm interested in seeing what apple is up to this year, so I thought it could be interesting to deep dive into what people have been saying this month before the release, which was announced today by Apple. It will happen on September 7th.

The dataset has 144k tweets created between July 11th and Sept 9th. Tweets are in English. As the new iPhone was just announced, I plan on updating the dataset to include newer examples and maybe a few older ones to increase the number of samples in the dataset, at least until the first week of launch.

Columns Description

[x] date_time - Date and Time tweet was sent

[x] username - Username that sent the tweet

[x] user_location - Location entered in the account location info on Twitter

[x] user_description - Text added to "about" in account

[x] verified - If the user has the "verified by Twitter" blue tick

[x] followers_count - Number of Followers

[x] following_count - Number of accounts followed by the person who sent the tweet

[x] tweet_like_count - How many people liked the tweet

[x] tweet_retweet_count - How many people retweeted the tweet

[x] tweet_reply_count - How many people replied to that tweet

[x] source - Where was the tweet sent from. The link has info if using iPhone, Android and others

[x] tweet_text - Text sent in the tweet

Data and Utilization

Data was scrapped from Twitter and uploaded as is, no further process to data cleaning was performed, but the data from the tweets are in very good shape. I'd maybe recommend separating data and time and finding a way to change the source from links to the device name or website, depending on what you are interested in using the data for.

Usage suggestions - Data can be used to perform sentiment analysis, look at the geographical distribution, trends, spam x ham identification, and others.
Bitcoin Tweets 2022
kaggle.com
ieee-dataport.org
zip
Updated Sep 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kumari2000 (2022). Bitcoin Tweets 2022 [Dataset]. https://www.kaggle.com/datasets/kumari2000/bitcoin-tweets-2022
Explore at:
zip(58365153 bytes)Available download formats
Dataset updated
Sep 23, 2022
Authors
kumari2000
Description
Bitcoin(₿) is a cryptocurrency invented in 2008 by an unknown person or group of people using the pseudonym Satoshi Nakamoto. The currency began use in 2009 when its implementation was released as open-source software.

Bitcoin is a blockchain-based decentralized digital currency, without a central bank or single administrator, that can be sent from user to user on the peer-to-peer bitcoin network without the need for intermediaries. Transactions are verified by network nodes through cryptography and recorded in a public distributed ledger called a blockchain. Bitcoins are created as a reward for a process known as mining. They can be exchanged for other currencies, products, and services.

I am sharing the Bitcoin Tweets Dataset to the research community containing large Tweets collected using Trackmyhashtag. The dataset consists of a total of 337,701 tweet IDs of the same number of tweets about bitcoin that were posted on Twitter from 15th Sept 2022 to 17th Sept 2022.

The dataset was collected using Trackmyhashtag, an easy & affordable platform.

A lot of international events that affected bitcoin happened during the collecting time period, which may make this dataset interesting to analyze.

Each Tweet contains different types of data :- - Tweet Id - Tweet URL - Posted time and date - Tweet Content - Other metadata

I hope researchers find it helpful. If you need more datasets, let me know.
Daily Social Media Active Users
kaggle.com
zip
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users
Explore at:
zip(126814 bytes)Available download formats
Dataset updated
May 5, 2025
Authors
Shaik Barood Mohammed Umar Adnaan Faiz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description:

The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

Dataset Breakdown:

Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.

Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.

Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.

Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.

Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.

Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.

Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

Context and Use Cases:

This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

Researchers, data scientists, and developers can use this dataset to:

Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.

Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.

Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.

Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.

Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.

Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

Future Considerations:

As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...
S&P500 Firm and CEOs Verified Twitter Handles
kaggle.com
zip
Updated Mar 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
skwolvie (2023). S&P500 Firm and CEOs Verified Twitter Handles [Dataset]. https://www.kaggle.com/datasets/skwolvie/s-and-p500-firm-and-ceos-verified-twitter-handles
Explore at:
zip(53663 bytes)Available download formats
Dataset updated
Mar 5, 2023
Authors
skwolvie
Description
Dataset

This dataset was created by skwolvie

Contents
Office of Personnel Management (OPM)
datasets.ai
catalog.data.gov
+1more
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2020). Office of Personnel Management (OPM) [Dataset]. https://datasets.ai/datasets/office-of-personnel-management-opm
Explore at:
Dataset updated
Nov 10, 2020
Dataset authored and provided by
Social Security Administrationhttp://ssa.gov/
Description
The purpose of this agreement is for SSA to verify SSN information for the Office of Personnel Management. OPM will use the SSN verifications in its investigative process to conduct background investigations of members of the military, Federal employees, applicants for Federal employment, and contractors affiliated with Federal agencies.
Global social media subscriptions comparison 2023
statista.com
de.statista.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Global social media subscriptions comparison 2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
Social media companies are starting to offer users the option to subscribe to their platforms in exchange for monthly fees. Until recently, social media has been predominantly free to use, with tech companies relying on advertising as their main revenue generator. However, advertising revenues have been dropping following the COVID-induced boom. As of July 2023, Meta Verified is the most costly of the subscription services, setting users back almost 15 U.S. dollars per month on iOS or Android. Twitter Blue costs between eight and 11 U.S. dollars per month and ensures users will receive the blue check mark, and have the ability to edit tweets and have NFT profile pictures. Snapchat+, drawing in four million users as of the second quarter of 2023, boasts a Story re-watch function, custom app icons, and a Snapchat+ badge.
Partial Subset of (🌇Sunset) 🇺🇦 Ukraine Conflict
kaggle.com
zip
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bilal Ahmad (2024). Partial Subset of (🌇Sunset) 🇺🇦 Ukraine Conflict [Dataset]. https://www.kaggle.com/datasets/ahmddbilall/ukraine-conflit-tweets/discussion
Explore at:
zip(2258283464 bytes)Available download formats
Dataset updated
Apr 17, 2024
Authors
Bilal Ahmad
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Ukraine
Description
This dataset is a subset of the larger (🌇Sunset) 🇺🇦 Ukraine Conflict Twitter Dataset, available on Kaggle, focusing specifically on tweets related to the ongoing conflict between Ukraine and Russia.

Data Source: The data was originally collected from the Twitter API by Creator. The dataset contains tweets spanning a significant timeframe, capturing public sentiment, news updates, and discussions related to the conflict between Ukraine and Russia.

File Size and Format: Given the extensive size of the original dataset (approximately 48GB), we've extracted and curated a smaller subset of approximately 4GB, focusing specifically on tweets relevant to the Ukraine conflict. The files have been renamed for ease of access and loading, making them more manageable for analysis and exploration.

Usage: Researchers, data scientists, and analysts interested in studying the discourse surrounding the Ukraine-Russia conflict, social media sentiment analysis, or geopolitical dynamics may find this dataset particularly valuable. It can be used for tasks such as sentiment analysis, topic modeling, trend analysis, and understanding public perceptions and reactions to unfolding events.

Disclaimer: While efforts have been made to ensure the accuracy and relevance of the data, users are encouraged to exercise caution and verify the information as Twitter data can be subject to biases, noise, and misinformation. Additionally, please adhere to Twitter's terms of service and guidelines when using this dataset for research or analysis purposes.
Coursera IPO - Tweets
kaggle.com
zip
Updated Apr 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tensor Girl (2021). Coursera IPO - Tweets [Dataset]. https://www.kaggle.com/usharengaraju/coursera-ipo-tweets
Explore at:
zip(1855 bytes)Available download formats
Dataset updated
Apr 17, 2021
Authors
Tensor Girl
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Coursera was last valued in the private market at $3.6 billion, according to PitchBook.

Founded in 2012 by former Stanford University computer science professors Daphne Koller and Andrew Ng, the Mountain View, California-based company offers individuals access to online courses and degrees from top universities, a business that has boomed throughout the Covid-19 pandemic.

Revenue last year jumped 59% to $293 million. Still, Coursera’s net losses widened to $66.8 million from $46.7 million in 2019 as the company said it added over 12,000 new degrees for students over the last two years. Total registered users grew 65% year over year in 2020.

Source : https://www.cnbc.com/2021/03/31/coursera-ipo-cour-begins-trading-on-the-nyse.html

Content

The dataset contains tweets regarding Coursera IPO from verified twitter accounts

Acknowledgements

Coursera IPO Tweets are scraped using Twint.

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.

https://pypi.org/project/twint/
Crypto Tweets | 80k in ENG | Aug 2022
kaggle.com
zip
Updated Aug 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tleonel (2022). Crypto Tweets | 80k in ENG | Aug 2022 [Dataset]. https://www.kaggle.com/datasets/tleonel/crypto-tweets-80k-in-eng-aug-2022
Explore at:
zip(10075792 bytes)Available download formats
Dataset updated
Aug 29, 2022
Authors
Tleonel
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🐦 🪙 💸 Crypto Tweets | 80k in English | Aug 2022 🐦 🪙 💸

Continuing on this series of datasets of tweets scrapped from Twitter, this dataset contains 80k tweets where the user mentions "crypto". This is a very popular topic with 80 k tweets in English being sent in 2 days, between Aug 28 and 29 2022.

https://images.unsplash.com/photo-1631603090989-93f9ef6f9d80?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1744&q=80" alt="Photo from Unsplash">

📝 Columns Description

[x] date_time - Date and Time tweet was sent

[x] username - Username that sent the tweet

[x] user_location - Location entered in the account location info on Twitter

[x] user_description - Text added to "about" in account

[x] verified - If the user has the "verified by Twitter" blue tick

[x] followers_count - Number of Followers

[x] following_count - Number of accounts followed by the person who sent the tweet

[x] tweet_like_count - How many people liked the tweet

[x] tweet_retweet_count - How many people retweeted the tweet

[x] tweet_reply_count - How many people replied to that tweet

[x] tweet_quoted_count - How many people quoted the tweet

[x] tweet_text - Text sent in the tweet

💡 Data and Utilization

Data was scrapped from Twitter and uploaded as is, no further process to data cleaning was performed, but the data from the tweets are in very good shape. I'd maybe recommend separating data and time and finding a way to change the source from links to the device name or website, depending on what you are interested in using the data for.

Usage suggestions - Data can be used to perform sentiment analysis, look at the geographical distribution, trends, spam x ham identification, and others.
Egypt Fake Tweets Detection Dataset Labeled
kaggle.com
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahmoud Elgendy68 (2025). Egypt Fake Tweets Detection Dataset Labeled [Dataset]. https://www.kaggle.com/datasets/mahmoudelgendy68/egypt-fake-tweets-detection-dataset-labeled/data
Explore at:
zip(1348136 bytes)Available download formats
Dataset updated
Apr 25, 2025
Authors
Mahmoud Elgendy68
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Egypt
Description
This dataset is part of a project focused on detecting fake news and misleading content in Egyptian Arabic text from Twitter and Facebook. It contains 22,906 labeled text samples, with labels representing:

f → Fake or misleading content

r → Real or factual content

idk → Unclear or ambiguous content

🔍 Sources & Labeling The dataset is based on manually labeled samples and semi-supervised labeling using an XGBoost classifier trained on a small seed set. Over 20,000 examples were confidently pseudo-labeled using probability thresholds.

The original texts are in Arabic, with content reflecting real social media discourse in Egypt, making this dataset particularly useful for research on:

Arabic NLP

Fake news detection

Misinformation studies

Social media analysis

🧠 Applications This dataset can be used for training and evaluating:

Text classification models

Fake news detectors

Sentiment analysis pipelines

Arabic language models

📌 Notes The dataset will be continuously refined, and future updates will include more manually verified labels. Please cite appropriately and reach out if using it in academic work.
Cryptocurrency extra data - Elon Musk's tweets
kaggle.com
zip
Updated Nov 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2021). Cryptocurrency extra data - Elon Musk's tweets [Dataset]. https://www.kaggle.com/datasets/yamqwe/elon-musks-twitter-updated-031121/discussion
Explore at:
zip(1816272 bytes)Available download formats
Dataset updated
Nov 3, 2021
Authors
Yam Peleg
Description
Warning!! this data won't be updated while the private leaderboard is calculated! If you use it in your solution, you are guaranteed to overfit!

Context

This is a bonus dataset to be used on the G-Research crypto forecasting competition containing the most powerful features for predicting cryptocurrencies movement: Elon Musk's Twitter 😂

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of all the tweets twitted by @elonmusk or mentioning @elonmusk, both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Twitter are collected, saved, and processed.

Content Over 13,000 tweets collected using Twitter API keyword search between 01.01.2010 and today. Columns are as follows:

user_name: The user name of the author. user_location: Location of the author. user_description: The 'description' on the author's profile. user_created: When was the user created. user_followers: Number of followers the user has. user_friends: Number of friends the user has. user_favourites: Number of favourites the user has. user_verified: Is this use verified? date: Date the tweet was tweeted. text: Content of the tweet

Indexing

The dataframe is indexed by date and sorted from oldest to newest. The first row starts at 01.01.2010 and the last one if of the time associated with the most recent run of the collector. [Hopefuly today]

License

Thanks to Twitter for providing the free API.

Sources

Elon Musk: https://en.wikipedia.org/wiki/Elon_Musk Twitter: https://twitter.com Elon Musk on Twitter: https://twitter.com/elonmusk
Covid Vaccine Tweets
kaggle.com
zip
Updated May 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2023). Covid Vaccine Tweets [Dataset]. https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets
Explore at:
zip(67025767 bytes)Available download formats
Dataset updated
May 6, 2023
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

COVID-19 is an infectious disease caused by a newly discovered strain of coronavirus, a type of virus known to cause respiratory infections in humans. This new strain was unknown before December 2019, when an outbreak of pneumonia of unidentified cause emerged in Wuhan, China.

Ever since the Covid-19 pandemic there has been quite a buzz in social media platforms and news sites regarding the need for COVID-19 Vaccine. As the number of people getting affected by Covid-19 has been increasing drastically. This data set brings you the twitter tweets made with the hashtag #CovidVaccine

Content

The tweets have #CovidVaccine hashtag. The collection started on 1/8/2020, and will be updated on a daily basis.

Information regarding the data

The data totally consists of 1 lakh+ records with 13 columns. The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers a account currently has. | | 6 | user_friends | The number of friends a account currently has. | | 7 | user_favourites | The number of favorites a account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #CovidVaccine | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |

Inspiration

You can use this data to dive into the subjects that use this hashtag, look to the geographical distribution, evaluate sentiments, looks to trends.
Turkey Earthquake Tweets
kaggle.com
zip
Updated Feb 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Preda (2023). Turkey Earthquake Tweets [Dataset]. https://www.kaggle.com/datasets/gpreda/turkey-earthquake-tweets/code
Explore at:
zip(4472599 bytes)Available download formats
Dataset updated
Feb 26, 2023
Authors
Gabriel Preda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Türkiye
Description
Context

Tweets about the massive earthquake in Turkey.

Content

This dataset is collected daily using tweepy and Twitter API. The source of the dataset is: public tweets about the massive earthquake in Turkey.

Data columns

The following columns are included:

ID

User name

User location

User description

User created

User followers

User friends

User favorites

User verified

Date

Text

Hashtags

Source

Retweets

Is retweet

Ideas for analysis

You can use this dataset to follow the trends about the news related to this unfortunate event. Use your NLP and data analysis skills to extract relevant information.
COVID-19 Tweets, Vaccination, and Deaths Data
kaggle.com
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arya Gavande (2025). COVID-19 Tweets, Vaccination, and Deaths Data [Dataset]. https://www.kaggle.com/datasets/aryagavande/covid-19-tweets-vaccination-and-deaths-data/code
Explore at:
zip(357725 bytes)Available download formats
Dataset updated
May 29, 2025
Authors
Arya Gavande
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset merges three distinct data sources to explore the relationship between COVID-19 death rates, vaccination efforts, and public sentiment on Twitter from December 25, 2020 to March 29, 2022. It includes 2,000 cleaned rows with 16 variables, created by combining global health statistics and social media sentiment data.

Sources & Variables:

COVID-19 Deaths Data (scraped from Worldometer - COVID-19 Deaths via BeautifulSoup):

Date: Date of record

daily_increase_percent: % change in deaths from previous day

Season: Derived from date (Winter, Spring, Summer, Fall)

Tweet Sentiment Data : COVID Vaccine Tweets Dataset

Date: Tweet timestamp

text_sentiment: Sentiment label (positive, neutral, negative) from NLTK’s SentimentIntensityAnalyzer

user_verified: Whether the user is verified

user_since_days: Age of the Twitter account (in days)

country: Cleaned user location

Vaccination Data : Vaccination Dataset

Date: Date of record

total_vaccinations_per_hundred: Doses per 100 people

daily_vaccinations: Daily dose count

vaccine_group: Grouped vaccine type (e.g., mRNA, Viral Vector)

country: Country name

Preprocessing Summary:

Merged by Date and country

Cleaned invalid country names (e.g., “moon”, “nowhere”)

Standardized all datetime formats

Removed entries with missing or unreliable values

Created derived variables: Season, user_since_days, vaccine_group

This dataset was used in a final data science project to:

Classify public sentiment toward vaccines using health indicators

Predict daily COVID-19 death counts using sentiment and vaccination data

Facebook

Twitter

Click to copy link

Link copied

Cite

adarsh (2022). Verified NFT Tweets [Dataset]. https://www.kaggle.com/datasets/adanai/verified-nft-tweets

Verified NFT Tweets

Tweets related to NFTs from verified Twitter users

Explore at:

zip(12309951 bytes)Available download formats

Dataset updated

Apr 11, 2022

Authors

adarsh

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Non-Fungible Tokens (NFTs) are a relatively new concept and have been making headlines for the related events happening in the space.

The best way to gauge the sentiments and get basic level stats is to use data from social media. Twitter is a powerful platform for people to express their opinions on any given topic. The tweets which include hashtags(#) related to NFTs are collected.

This dataset can possibly help to capture the trend of NFTs by using available data and answerquestions that help understand how far NFTs have come.

Clear search

Close search

Google apps

Main menu

Verified NFT Tweets

Sentiment Analysis on Financial Tweets

Context

Content

Acknowledgements

Inspiration

🇺🇸 Charlie Kirk(†) Twitter/ 𝕏 Dataset

Who is Charlie Kirk?

CBS News' "Who was Charlie Kirk?"

Death

Coverage of this Dataset

Important Note

"Well-known" authors

Source

Image

Data from: DISMISS: Database of Indian Social Media Influencers on Twitter

US_Congressional_Tweets_Dataset

Dataset Structure:

Analysis Performed:

Python Libraries Used:

Conclusion:

ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related...

iPhone 14 Tweets [July / Sept 2022 +144k English]

iPhone 14 📱 🐦 Tweets [11 July - Sept 9 2022 - 144k English] 📱 🐦

Columns Description

Data and Utilization

Bitcoin Tweets 2022

Daily Social Media Active Users

S&P500 Firm and CEOs Verified Twitter Handles

Dataset

Contents

Office of Personnel Management (OPM)

Global social media subscriptions comparison 2023

Partial Subset of (🌇Sunset) 🇺🇦 Ukraine Conflict

Coursera IPO - Tweets

Context

Content

Acknowledgements

Crypto Tweets | 80k in ENG | Aug 2022

🐦 🪙 💸 Crypto Tweets | 80k in English | Aug 2022 🐦 🪙 💸

📝 Columns Description

💡 Data and Utilization

Egypt Fake Tweets Detection Dataset Labeled

Cryptocurrency extra data - Elon Musk's tweets

Context

Introduction

The Data

Indexing

License

Sources

Covid Vaccine Tweets

Context

Content

Information regarding the data

Inspiration

Turkey Earthquake Tweets

Context

Content

Data columns

Ideas for analysis

COVID-19 Tweets, Vaccination, and Deaths Data

Sources & Variables:

Preprocessing Summary:

Verified NFT Tweets

Tweets related to NFTs from verified Twitter users