24 datasets found
  1. Verified NFT Tweets

    • kaggle.com
    zip
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    adarsh (2022). Verified NFT Tweets [Dataset]. https://www.kaggle.com/datasets/adanai/verified-nft-tweets
    Explore at:
    zip(12309951 bytes)Available download formats
    Dataset updated
    Apr 11, 2022
    Authors
    adarsh
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Non-Fungible Tokens (NFTs) are a relatively new concept and have been making headlines for the related events happening in the space.

    The best way to gauge the sentiments and get basic level stats is to use data from social media. Twitter is a powerful platform for people to express their opinions on any given topic. The tweets which include hashtags(#) related to NFTs are collected.

    This dataset can possibly help to capture the trend of NFTs by using available data and answerquestions that help understand how far NFTs have come.

  2. Sentiment Analysis on Financial Tweets

    • kaggle.com
    zip
    Updated Sep 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
    Explore at:
    zip(2538259 bytes)Available download formats
    Dataset updated
    Sep 5, 2019
    Authors
    Vivek Rathi
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

    Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

    "I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

    I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

    Content

    This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

    Acknowledgements

    The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

    Inspiration

    I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)

  3. 🇺🇸 Charlie Kirk(†) Twitter/ 𝕏 Dataset

    • kaggle.com
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2025). 🇺🇸 Charlie Kirk(†) Twitter/ 𝕏 Dataset [Dataset]. http://doi.org/10.34740/kaggle/ds/8259158
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2025
    Dataset provided by
    Kaggle
    Authors
    BwandoWando
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Who is Charlie Kirk?

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F9ff49a3bb052e339eb85a66dca611f6c%2Fcharlie-kirk-turning-point2-91025-91025-a19b6183557949938f0dc01df2c33a28.jpg?generation=1757731111497297&alt=media" alt="">

    Charles James Kirk (October 14, 1993 – September 10, 2025) was an American conservative political activist, author, and media personality. He co-founded the organization Turning Point USA (TPUSA) in 2012 and was its executive director. He was the chief executive officer of Turning Point Action (TPAction) and a member of the Council for National Policy (CNP). In his later years, he was one of the most prominent voices of the populist MAGA movement and exemplified the growth of Christian nationalism in the Republican Party.

    From: https://en.wikipedia.org/wiki/Charlie_Kirk

    CBS News' "Who was Charlie Kirk?"

    https://www.youtube.com/watch?v=0xngCgJnO5E" alt="">

    Death

    On September 10, 2025, while on stage at Utah Valley University in Orem, Utah, for a TPUSA event, "The American Comeback Tour", Kirk was fatally shot in the neck. The shooting took place at 12:23 p.m. MDT (18:23 UTC), around 20 minutes after the event began, in front of an audience of about 3,000 people.

    From: https://en.wikipedia.org/wiki/Charlie_Kirk

    Coverage of this Dataset

    • I queried tweets with either #CharlieKirk or "Charlie Kirk" in them within the last 36 hours.

    Important Note

    • All tagged usernames (ex: @username) and forms of Ids are obfuscated and replaced with a unique hashid value based on original value retaining data integrity
    • Tagged usernames that have been banned, suspended, or deleted from the platform are still obfuscated

    "Well-known" authors

    I added a file to denote users who have posted tweets about the topic that have either characteristic(s) - Blue-certified accounts with at least 10K followers - Non-Blue-certified accounts with at least 50K followers

    This is to help map back and include additional context on who these users that are being tagged or are creating the tweets

    Source

    I signed up for a trial with https://twitterapi.io/ , check it out!

    Image

    Credit : OLIVIER TOURON/ AFP via Getty

  4. H

    Data from: DISMISS: Database of Indian Social Media Influencers on Twitter

    • dataverse.harvard.edu
    • dataone.org
    Updated Apr 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshia Arya; Soham De; Dibyendu Mishra; Gazal Shekhawat; Ankur Sharma; Anmol Panda; Faisal M Lalani; Parantak Singh; Ramaravind Kommiya Mothilal; Rynaa Grover; Sachita Nishal; Saloni Dash; Shehla Rashid Shora; Syeda Zainab Akbar; Joyojeet Pal (2022). DISMISS: Database of Indian Social Media Influencers on Twitter [Dataset]. http://doi.org/10.7910/DVN/BPY2JY
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Arshia Arya; Soham De; Dibyendu Mishra; Gazal Shekhawat; Ankur Sharma; Anmol Panda; Faisal M Lalani; Parantak Singh; Ramaravind Kommiya Mothilal; Rynaa Grover; Sachita Nishal; Saloni Dash; Shehla Rashid Shora; Syeda Zainab Akbar; Joyojeet Pal
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Databases of highly networked individuals have been indispensable in studying narratives and influence on social media. To support studies on Twitter in India, we present a systematically categorized database of accounts of influence on Twitter in India, identified and annotated through an iterative process of friends, networks, and self-described profile information, verified manually. We built an initial set of accounts based on the friend network of a seed set of accounts based on real-world renown in various fields, and then snowballed friends of friends" multiple times, and rank ordered individuals based on the number of in-group connections, and overall followers. We then manually classified identified accounts under the categories of entertainment, sports, business, government, institutions, journalism, civil society accounts that have independent standing outside of social media, as well as a category ofdigital first" referring to accounts that derive their primary influence from online activity. Overall, we annotated 11580 unique accounts across all categories. The database is useful studying various questions related to the role of influencers in polarisation, misinformation, extreme speech, political discourse etc.

  5. US_Congressional_Tweets_Dataset

    • kaggle.com
    zip
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar Yáñez Feijóo (2024). US_Congressional_Tweets_Dataset [Dataset]. https://www.kaggle.com/datasets/oscaryezfeijo/us-congressional-tweets-dataset
    Explore at:
    zip(243754786 bytes)Available download formats
    Dataset updated
    Jan 4, 2024
    Authors
    Oscar Yáñez Feijóo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    United States
    Description

    The "US Congressional Tweets Dataset" is a comprehensive collection of tweets from US Congressional members spanning from 2008 to 2017. This dataset is valuable for organizations like Lobbyists4America, which aims to gain insights into legislative trends and influences for effective lobbying strategies. The dataset is structured into two primary components: users_df and tweets_df.

    Dataset Structure:

    1. users_df: This DataFrame provides detailed information about the Twitter accounts of various congressional members. It includes a range of attributes such as:

      • Account creation date (created_at), follower and friend counts (followers_count, friends_count).
      • Profile-related information like description, location, and verification status.
      • Various Twitter-specific features like contributors_enabled, default_profile, is_translator, etc.
    2. tweets_df: This DataFrame contains the actual tweet data from these congressional accounts. Key columns include:

      • created_at: The timestamp of the tweet.
      • favorite_count and retweet_count: Indicators of the tweet's popularity.
      • text: The text content of the tweet.
      • Metadata such as user_id, lang (language), and source (device/app used for tweeting).
      • Other attributes like possibly_sensitive, quoted_status_id, and engagement-related fields.

    Analysis Performed:

    The dataset is utilized for various analyses, including:

    1. Network Analysis: Exploring the connections and interactions between different congressional members on Twitter, potentially revealing influential figures or groups within Congress.

    2. Sentiment Analysis: Using libraries like TextBlob and NLTK, this analysis assesses the sentiment (positive, negative, neutral) of tweets to understand the general tone and stance of congressional members on various issues.

    3. Correlation Analysis: Investigating relationships between different numerical features in the dataset, such as whether higher tweet frequencies correlate with more followers.

    4. Word Clustering/Topic Modeling: Utilizing NMF (Non-Negative Matrix Factorization) from scikit-learn to cluster words and identify major themes or topics discussed in the tweets.

    5. Time Series Analysis: Observing trends and patterns in tweeting behavior over time, such as increased activity around elections or significant political events.

    Python Libraries Used:

    • Pandas: For data manipulation and analysis.
    • Matplotlib: For visualizing the data.
    • TextBlob and NLTK: For processing textual data and performing sentiment analysis.
    • scikit-learn (sklearn): For machine learning tasks like NMF for topic modeling.
    • spaCy: An advanced natural language processing library.
    • NetworkX: For conducting network analysis.
    • ipywidgets and pytz: For creating interactive elements and handling time zones in the data, respectively.

    Conclusion:

    The "US Congressional Tweets Dataset" is a rich source for analyzing the digital footprint of US Congressional members. Through the application of various data science techniques, Lobbyists4America can extract meaningful insights about political sentiments, networking patterns, and topical trends among lawmakers. This information is crucial for tailoring lobbying efforts and understanding the legislative landscape.

  6. ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related...

    • figshare.com
    xlsx
    Updated Jan 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praatibh Surana; Mirza Yusuf; Sanjay Singh (2022). ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health [Dataset]. http://doi.org/10.6084/m9.figshare.19029656.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 25, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Praatibh Surana; Mirza Yusuf; Sanjay Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Manipal
    Description

    Readme file for ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health Generated on 2021-02-15Recommended citation for the dataset:P. Surana, M. Yusuf and S. Singh, "Severity Classification of Mental Health-Related Tweets," 2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), 2021, pp. 336-341, DOI: 10.1109/DISCOVER52564.2021.9663651.******************************PROJECT INFORMATION******************************1. Title of dataset: Mental Health Dataset2. Author information:Praatibh Surana, Manipal Institute of Technology,Mirza Yusuf, Manipal Institute of Technology,Sanjay Singh, Manipal Institute of TechnologyPrincipal Investigators Name: Praatibh SuranaAddress: Manipal Institute of TechnologyEmail: praatibhsurana@gmail.comName: Mirza YusufAddress: Manipal Institute of TechnologyEmail: baig.yusuf.cr7@gmail.comCo-InvestigatorName: Sanjay SinghAddress: Manipal Institute of TechnologyEmail: sanjay.singh@manipal.edu3. Date of data collection: Jan 2021 - Feb 2021************************************DATA ACCESS INFORMATION************************************1. Licences/restrictions placed on access to the dataset: CC BY 4.02. Links to publications that use the data:URL: https://ieeexplore.ieee.org/document/9663651,DOI: 10.1109/DISCOVER52564.2021.96636513. Links to a third party or secondary data used in the project (for example, existing datasets, third-party datasets)Pennington, Jeffrey et al. “GloVe: Global Vectors for Word Representation.” EMNLP (2014).DOI: https://doi.org/10.3115/v1/d14-1162*****************************************METHODS OF DATA COLLECTION*****************************************1. Describe the methods for data collection and/or provide links to papers describing data collection methodsPaper DOI :Our research revolved around correctly classifying tweets based on their severity in a mental health context. An effort was also made to make the models detect sarcasm better, as this was something that many models in the past failed to do. Our dataset consists of tweets labeled as ‘0’, ‘1’, and '2' depending on their classes. The labeling rules followed are given in Table 1TABLE 1 - SEVERITY CLASSIFICATION CLASSES AND EXAMPLES-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Class | Rules | Example | |0 | Helping / suggestion for mental health awareness | Are you suffering from anxiety? Check out this page for therapy through Skype! | / positivity / informative | | / motivational | | / questions about mental health | | |1 | Sarcasm/rant/expression of annoyance | Today was so annoying. If my teacher would have called my name, I swear to God I would have killed myself | |2 | Case of slight disturbance | All I am is a burden. I don’t want to live anymore. | / strong indication of disturbance | | / user outright mentions depression | | / anxiety / suicide / self-harm |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------The following steps were performed for data collection:1) Tweets were extracted with the help of Twitter’s official API using hashtags such as #depression, #mentalhealth, #anxiety, #selfharm, #killmyself, and #kms from users.2) Around 40,000 tweets were extracted from Twitter between January and February 2021, out of which the final dataset comprised of 2460 tweets; 820 tweets were distributed equally amongst the three classes.3) Two authors manually annotated the dataset and cross-verified it to ensure accurate labeling.2. Data processing methods:A. Preprocessing1) Removal of numbers, URLs, usernames, and special characters: The first step after extraction of the tweets was ensuring that they were suitable for further use. The “preprocessor” uses the Python library to eliminate numbers, retweets, URLs, emojis, emoticons, and usernames, followed by duplicate tweets removal from the dataset.2) Stopword removal and expansion of standard abbreviations: We made use of Python’s “nltk” library for the removal of common stopwords such as “for,” “the,” “a,” etc. As our data is sourced from Twitter, lots of common internet abbreviations like “lol,” “kms,” “gn,”etc., were used. It was taken care of by converting these short forms to their corresponding complete forms. Lots of short forms like “wanna” for “want to” and “gonna” for “going to” were used. We ensured that these, too, were taken care of to the best of our abilities. 3) Removal of names, so that anonymity is maintained. Names of people, places, twitter handles anything that could compromise the anonymity has been removed, a token named as ‘[redacted]’ has been used in their place instead.*******************************SUMMARY OF DATA FILE*******************************Filename: MentalHealthTweets.csvShort description: This CSV File contains 2460 tweets annotated ‘0’, ‘1’ or ‘2’ based on the class they belong to.*******************************************************************DATA-SPECIFIC INFORMATION FOR NOTE: This section should be copied and pasted for each file*******************************************************************1. Number of variables: 22. Number of cases rows: 24613. Missing data codes: NA4. Variable listThe variables and their properties have been provided in Table 2TABLE 2 - VARIABLE DESCRIPTION TABLE----------------------------------------------------------------------Variable Name | Variable Description | Variable Type | |tweets | Cleaned up tweet | String | |label | Annotation for tweet | Integer----------------------------------------------------------------------

  7. iPhone 14 Tweets [July / Sept 2022 +144k English]

    • kaggle.com
    zip
    Updated Sep 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tleonel (2022). iPhone 14 Tweets [July / Sept 2022 +144k English] [Dataset]. https://www.kaggle.com/datasets/tleonel/iphone14-tweets
    Explore at:
    zip(16821184 bytes)Available download formats
    Dataset updated
    Sep 8, 2022
    Authors
    Tleonel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    iPhone 14 📱 🐦 Tweets [11 July - Sept 9 2022 - 144k English] 📱 🐦

    Updated on Sept 9th Includes sent tweets after launch

    https://store.storeimages.cdn-apple.com/4668/as-images.apple.com/is/iphone-14-pro-finish-unselect-gallery-1-202209_GEO_EMEA?wid=5120&hei=2880&fmt=p-jpg&qlt=80&.v=1660754213188" alt="Photo by Apple">

    Trying to do something useful and add a dataset here in Kaggle, and while there are over 90+ datasets for Elon, there's none yet for tweets about the upcoming iPhone 14. I'm interested in seeing what apple is up to this year, so I thought it could be interesting to deep dive into what people have been saying this month before the release, which was announced today by Apple. It will happen on September 7th.

    The dataset has 144k tweets created between July 11th and Sept 9th. Tweets are in English. As the new iPhone was just announced, I plan on updating the dataset to include newer examples and maybe a few older ones to increase the number of samples in the dataset, at least until the first week of launch.

    Columns Description

    • [x] date_time - Date and Time tweet was sent
    • [x] username - Username that sent the tweet
    • [x] user_location - Location entered in the account location info on Twitter
    • [x] user_description - Text added to "about" in account
    • [x] verified - If the user has the "verified by Twitter" blue tick
    • [x] followers_count - Number of Followers
    • [x] following_count - Number of accounts followed by the person who sent the tweet
    • [x] tweet_like_count - How many people liked the tweet
    • [x] tweet_retweet_count - How many people retweeted the tweet
    • [x] tweet_reply_count - How many people replied to that tweet
    • [x] source - Where was the tweet sent from. The link has info if using iPhone, Android and others
    • [x] tweet_text - Text sent in the tweet

    Data and Utilization

    Data was scrapped from Twitter and uploaded as is, no further process to data cleaning was performed, but the data from the tweets are in very good shape. I'd maybe recommend separating data and time and finding a way to change the source from links to the device name or website, depending on what you are interested in using the data for.

    Usage suggestions - Data can be used to perform sentiment analysis, look at the geographical distribution, trends, spam x ham identification, and others.

  8. Bitcoin Tweets 2022

    • kaggle.com
    • ieee-dataport.org
    zip
    Updated Sep 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kumari2000 (2022). Bitcoin Tweets 2022 [Dataset]. https://www.kaggle.com/datasets/kumari2000/bitcoin-tweets-2022
    Explore at:
    zip(58365153 bytes)Available download formats
    Dataset updated
    Sep 23, 2022
    Authors
    kumari2000
    Description

    Bitcoin(₿) is a cryptocurrency invented in 2008 by an unknown person or group of people using the pseudonym Satoshi Nakamoto. The currency began use in 2009 when its implementation was released as open-source software.

    Bitcoin is a blockchain-based decentralized digital currency, without a central bank or single administrator, that can be sent from user to user on the peer-to-peer bitcoin network without the need for intermediaries. Transactions are verified by network nodes through cryptography and recorded in a public distributed ledger called a blockchain. Bitcoins are created as a reward for a process known as mining. They can be exchanged for other currencies, products, and services.

    I am sharing the Bitcoin Tweets Dataset to the research community containing large Tweets collected using Trackmyhashtag. The dataset consists of a total of 337,701 tweet IDs of the same number of tweets about bitcoin that were posted on Twitter from 15th Sept 2022 to 17th Sept 2022.

    The dataset was collected using Trackmyhashtag, an easy & affordable platform.

    A lot of international events that affected bitcoin happened during the collecting time period, which may make this dataset interesting to analyze.

    Each Tweet contains different types of data :- - Tweet Id - Tweet URL - Posted time and date - Tweet Content - Other metadata

    I hope researchers find it helpful. If you need more datasets, let me know.

  9. Daily Social Media Active Users

    • kaggle.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users
    Explore at:
    zip(126814 bytes)Available download formats
    Dataset updated
    May 5, 2025
    Authors
    Shaik Barood Mohammed Umar Adnaan Faiz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

    Dataset Breakdown:

    • Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.

    • Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.

    • Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.

    • Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.

    • Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.

    • Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.

    • Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

    Context and Use Cases:

    • This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

    Researchers, data scientists, and developers can use this dataset to:

    • Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.

    • Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.

    • Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.

    • Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.

    • Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.

    • Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

    The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

    Future Considerations:

    As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

    By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...

  10. S&P500 Firm and CEOs Verified Twitter Handles

    • kaggle.com
    zip
    Updated Mar 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    skwolvie (2023). S&P500 Firm and CEOs Verified Twitter Handles [Dataset]. https://www.kaggle.com/datasets/skwolvie/s-and-p500-firm-and-ceos-verified-twitter-handles
    Explore at:
    zip(53663 bytes)Available download formats
    Dataset updated
    Mar 5, 2023
    Authors
    skwolvie
    Description

    Dataset

    This dataset was created by skwolvie

    Contents

  11. Office of Personnel Management (OPM)

    • datasets.ai
    • catalog.data.gov
    • +1more
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2020). Office of Personnel Management (OPM) [Dataset]. https://datasets.ai/datasets/office-of-personnel-management-opm
    Explore at:
    Dataset updated
    Nov 10, 2020
    Dataset authored and provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The purpose of this agreement is for SSA to verify SSN information for the Office of Personnel Management. OPM will use the SSN verifications in its investigative process to conduct background investigations of members of the military, Federal employees, applicants for Federal employment, and contractors affiliated with Federal agencies.

  12. Global social media subscriptions comparison 2023

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Global social media subscriptions comparison 2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Social media companies are starting to offer users the option to subscribe to their platforms in exchange for monthly fees. Until recently, social media has been predominantly free to use, with tech companies relying on advertising as their main revenue generator. However, advertising revenues have been dropping following the COVID-induced boom. As of July 2023, Meta Verified is the most costly of the subscription services, setting users back almost 15 U.S. dollars per month on iOS or Android. Twitter Blue costs between eight and 11 U.S. dollars per month and ensures users will receive the blue check mark, and have the ability to edit tweets and have NFT profile pictures. Snapchat+, drawing in four million users as of the second quarter of 2023, boasts a Story re-watch function, custom app icons, and a Snapchat+ badge.

  13. Partial Subset of (🌇Sunset) 🇺🇦 Ukraine Conflict

    • kaggle.com
    zip
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilal Ahmad (2024). Partial Subset of (🌇Sunset) 🇺🇦 Ukraine Conflict [Dataset]. https://www.kaggle.com/datasets/ahmddbilall/ukraine-conflit-tweets/discussion
    Explore at:
    zip(2258283464 bytes)Available download formats
    Dataset updated
    Apr 17, 2024
    Authors
    Bilal Ahmad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Ukraine
    Description

    This dataset is a subset of the larger (🌇Sunset) 🇺🇦 Ukraine Conflict Twitter Dataset, available on Kaggle, focusing specifically on tweets related to the ongoing conflict between Ukraine and Russia.

    Data Source: The data was originally collected from the Twitter API by Creator. The dataset contains tweets spanning a significant timeframe, capturing public sentiment, news updates, and discussions related to the conflict between Ukraine and Russia.

    File Size and Format: Given the extensive size of the original dataset (approximately 48GB), we've extracted and curated a smaller subset of approximately 4GB, focusing specifically on tweets relevant to the Ukraine conflict. The files have been renamed for ease of access and loading, making them more manageable for analysis and exploration.

    Usage: Researchers, data scientists, and analysts interested in studying the discourse surrounding the Ukraine-Russia conflict, social media sentiment analysis, or geopolitical dynamics may find this dataset particularly valuable. It can be used for tasks such as sentiment analysis, topic modeling, trend analysis, and understanding public perceptions and reactions to unfolding events.

    Disclaimer: While efforts have been made to ensure the accuracy and relevance of the data, users are encouraged to exercise caution and verify the information as Twitter data can be subject to biases, noise, and misinformation. Additionally, please adhere to Twitter's terms of service and guidelines when using this dataset for research or analysis purposes.

  14. Coursera IPO - Tweets

    • kaggle.com
    zip
    Updated Apr 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tensor Girl (2021). Coursera IPO - Tweets [Dataset]. https://www.kaggle.com/usharengaraju/coursera-ipo-tweets
    Explore at:
    zip(1855 bytes)Available download formats
    Dataset updated
    Apr 17, 2021
    Authors
    Tensor Girl
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Coursera was last valued in the private market at $3.6 billion, according to PitchBook.

    Founded in 2012 by former Stanford University computer science professors Daphne Koller and Andrew Ng, the Mountain View, California-based company offers individuals access to online courses and degrees from top universities, a business that has boomed throughout the Covid-19 pandemic.

    Revenue last year jumped 59% to $293 million. Still, Coursera’s net losses widened to $66.8 million from $46.7 million in 2019 as the company said it added over 12,000 new degrees for students over the last two years. Total registered users grew 65% year over year in 2020.

    Source : https://www.cnbc.com/2021/03/31/coursera-ipo-cour-begins-trading-on-the-nyse.html

    Content

    The dataset contains tweets regarding Coursera IPO from verified twitter accounts

    Acknowledgements

    Coursera IPO Tweets are scraped using Twint.

    Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.

    https://pypi.org/project/twint/

  15. Crypto Tweets | 80k in ENG | Aug 2022

    • kaggle.com
    zip
    Updated Aug 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tleonel (2022). Crypto Tweets | 80k in ENG | Aug 2022 [Dataset]. https://www.kaggle.com/datasets/tleonel/crypto-tweets-80k-in-eng-aug-2022
    Explore at:
    zip(10075792 bytes)Available download formats
    Dataset updated
    Aug 29, 2022
    Authors
    Tleonel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🐦 🪙 💸 Crypto Tweets | 80k in English | Aug 2022 🐦 🪙 💸

    Continuing on this series of datasets of tweets scrapped from Twitter, this dataset contains 80k tweets where the user mentions "crypto". This is a very popular topic with 80 k tweets in English being sent in 2 days, between Aug 28 and 29 2022.

    https://images.unsplash.com/photo-1631603090989-93f9ef6f9d80?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1744&q=80" alt="Photo from Unsplash">

    📝 Columns Description

    • [x] date_time - Date and Time tweet was sent
    • [x] username - Username that sent the tweet
    • [x] user_location - Location entered in the account location info on Twitter
    • [x] user_description - Text added to "about" in account
    • [x] verified - If the user has the "verified by Twitter" blue tick
    • [x] followers_count - Number of Followers
    • [x] following_count - Number of accounts followed by the person who sent the tweet
    • [x] tweet_like_count - How many people liked the tweet
    • [x] tweet_retweet_count - How many people retweeted the tweet
    • [x] tweet_reply_count - How many people replied to that tweet
    • [x] tweet_quoted_count - How many people quoted the tweet
    • [x] tweet_text - Text sent in the tweet

    💡 Data and Utilization

    Data was scrapped from Twitter and uploaded as is, no further process to data cleaning was performed, but the data from the tweets are in very good shape. I'd maybe recommend separating data and time and finding a way to change the source from links to the device name or website, depending on what you are interested in using the data for.

    Usage suggestions - Data can be used to perform sentiment analysis, look at the geographical distribution, trends, spam x ham identification, and others.

  16. Egypt Fake Tweets Detection Dataset Labeled

    • kaggle.com
    zip
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Elgendy68 (2025). Egypt Fake Tweets Detection Dataset Labeled [Dataset]. https://www.kaggle.com/datasets/mahmoudelgendy68/egypt-fake-tweets-detection-dataset-labeled/data
    Explore at:
    zip(1348136 bytes)Available download formats
    Dataset updated
    Apr 25, 2025
    Authors
    Mahmoud Elgendy68
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Egypt
    Description

    This dataset is part of a project focused on detecting fake news and misleading content in Egyptian Arabic text from Twitter and Facebook. It contains 22,906 labeled text samples, with labels representing:

    f → Fake or misleading content

    r → Real or factual content

    idk → Unclear or ambiguous content

    🔍 Sources & Labeling The dataset is based on manually labeled samples and semi-supervised labeling using an XGBoost classifier trained on a small seed set. Over 20,000 examples were confidently pseudo-labeled using probability thresholds.

    The original texts are in Arabic, with content reflecting real social media discourse in Egypt, making this dataset particularly useful for research on:

    Arabic NLP

    Fake news detection

    Misinformation studies

    Social media analysis

    🧠 Applications This dataset can be used for training and evaluating:

    Text classification models

    Fake news detectors

    Sentiment analysis pipelines

    Arabic language models

    📌 Notes The dataset will be continuously refined, and future updates will include more manually verified labels. Please cite appropriately and reach out if using it in academic work.

  17. Cryptocurrency extra data - Elon Musk's tweets

    • kaggle.com
    zip
    Updated Nov 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2021). Cryptocurrency extra data - Elon Musk's tweets [Dataset]. https://www.kaggle.com/datasets/yamqwe/elon-musks-twitter-updated-031121/discussion
    Explore at:
    zip(1816272 bytes)Available download formats
    Dataset updated
    Nov 3, 2021
    Authors
    Yam Peleg
    Description

    Warning!! this data won't be updated while the private leaderboard is calculated! If you use it in your solution, you are guaranteed to overfit!

    Context

    This is a bonus dataset to be used on the G-Research crypto forecasting competition containing the most powerful features for predicting cryptocurrencies movement: Elon Musk's Twitter 😂

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of all the tweets twitted by @elonmusk or mentioning @elonmusk, both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Twitter are collected, saved, and processed.

    Content Over 13,000 tweets collected using Twitter API keyword search between 01.01.2010 and today. Columns are as follows:

    user_name: The user name of the author.
    user_location: Location of the author.
    user_description: The 'description' on the author's profile.
    user_created: When was the user created. 
    user_followers: Number of followers the user has. 
    user_friends: Number of friends the user has.
    user_favourites: Number of favourites the user has.
    user_verified: Is this use verified?
    date: Date the tweet was tweeted.
    text: Content of the tweet
    

    Indexing

    The dataframe is indexed by date and sorted from oldest to newest. The first row starts at 01.01.2010 and the last one if of the time associated with the most recent run of the collector. [Hopefuly today]

    License

    Thanks to Twitter for providing the free API.

    Sources

    Elon Musk: https://en.wikipedia.org/wiki/Elon_Musk Twitter: https://twitter.com Elon Musk on Twitter: https://twitter.com/elonmusk

  18. Covid Vaccine Tweets

    • kaggle.com
    zip
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kash (2023). Covid Vaccine Tweets [Dataset]. https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets
    Explore at:
    zip(67025767 bytes)Available download formats
    Dataset updated
    May 6, 2023
    Authors
    Kash
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    COVID-19 is an infectious disease caused by a newly discovered strain of coronavirus, a type of virus known to cause respiratory infections in humans. This new strain was unknown before December 2019, when an outbreak of pneumonia of unidentified cause emerged in Wuhan, China.

    Ever since the Covid-19 pandemic there has been quite a buzz in social media platforms and news sites regarding the need for COVID-19 Vaccine. As the number of people getting affected by Covid-19 has been increasing drastically. This data set brings you the twitter tweets made with the hashtag #CovidVaccine

    Content

    The tweets have #CovidVaccine hashtag. The collection started on 1/8/2020, and will be updated on a daily basis.

    Information regarding the data

    The data totally consists of 1 lakh+ records with 13 columns. The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers a account currently has. | | 6 | user_friends | The number of friends a account currently has. | | 7 | user_favourites | The number of favorites a account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #CovidVaccine | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |

    Inspiration

    You can use this data to dive into the subjects that use this hashtag, look to the geographical distribution, evaluate sentiments, looks to trends.

  19. Turkey Earthquake Tweets

    • kaggle.com
    zip
    Updated Feb 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Preda (2023). Turkey Earthquake Tweets [Dataset]. https://www.kaggle.com/datasets/gpreda/turkey-earthquake-tweets/code
    Explore at:
    zip(4472599 bytes)Available download formats
    Dataset updated
    Feb 26, 2023
    Authors
    Gabriel Preda
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Türkiye
    Description

    Context

    Tweets about the massive earthquake in Turkey.

    Content

    This dataset is collected daily using tweepy and Twitter API. The source of the dataset is: public tweets about the massive earthquake in Turkey.

    Data columns

    The following columns are included:

    • ID
    • User name
    • User location
    • User description
    • User created
    • User followers
    • User friends
    • User favorites
    • User verified
    • Date
    • Text
    • Hashtags
    • Source
    • Retweets
    • Is retweet

    Ideas for analysis

    You can use this dataset to follow the trends about the news related to this unfortunate event. Use your NLP and data analysis skills to extract relevant information.

  20. COVID-19 Tweets, Vaccination, and Deaths Data

    • kaggle.com
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Gavande (2025). COVID-19 Tweets, Vaccination, and Deaths Data [Dataset]. https://www.kaggle.com/datasets/aryagavande/covid-19-tweets-vaccination-and-deaths-data/code
    Explore at:
    zip(357725 bytes)Available download formats
    Dataset updated
    May 29, 2025
    Authors
    Arya Gavande
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset merges three distinct data sources to explore the relationship between COVID-19 death rates, vaccination efforts, and public sentiment on Twitter from December 25, 2020 to March 29, 2022. It includes 2,000 cleaned rows with 16 variables, created by combining global health statistics and social media sentiment data.

    Sources & Variables:

    1. COVID-19 Deaths Data (scraped from Worldometer - COVID-19 Deaths via BeautifulSoup):

      • Date: Date of record
      • daily_increase_percent: % change in deaths from previous day
      • Season: Derived from date (Winter, Spring, Summer, Fall)
    2. Tweet Sentiment Data : COVID Vaccine Tweets Dataset

      • Date: Tweet timestamp
      • text_sentiment: Sentiment label (positive, neutral, negative) from NLTK’s SentimentIntensityAnalyzer
      • user_verified: Whether the user is verified
      • user_since_days: Age of the Twitter account (in days)
      • country: Cleaned user location
    3. Vaccination Data : Vaccination Dataset

      • Date: Date of record
      • total_vaccinations_per_hundred: Doses per 100 people
      • daily_vaccinations: Daily dose count
      • vaccine_group: Grouped vaccine type (e.g., mRNA, Viral Vector)
      • country: Country name

    Preprocessing Summary:

    • Merged by Date and country
    • Cleaned invalid country names (e.g., “moon”, “nowhere”)
    • Standardized all datetime formats
    • Removed entries with missing or unreliable values
    • Created derived variables: Season, user_since_days, vaccine_group

    This dataset was used in a final data science project to:

    • Classify public sentiment toward vaccines using health indicators
    • Predict daily COVID-19 death counts using sentiment and vaccination data
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
adarsh (2022). Verified NFT Tweets [Dataset]. https://www.kaggle.com/datasets/adanai/verified-nft-tweets
Organization logo

Verified NFT Tweets

Tweets related to NFTs from verified Twitter users

Explore at:
zip(12309951 bytes)Available download formats
Dataset updated
Apr 11, 2022
Authors
adarsh
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Non-Fungible Tokens (NFTs) are a relatively new concept and have been making headlines for the related events happening in the space.

The best way to gauge the sentiments and get basic level stats is to use data from social media. Twitter is a powerful platform for people to express their opinions on any given topic. The tweets which include hashtags(#) related to NFTs are collected.

This dataset can possibly help to capture the trend of NFTs by using available data and answerquestions that help understand how far NFTs have come.

Search
Clear search
Close search
Google apps
Main menu