Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Non-Fungible Tokens (NFTs) are a relatively new concept and have been making headlines for the related events happening in the space.
The best way to gauge the sentiments and get basic level stats is to use data from social media. Twitter is a powerful platform for people to express their opinions on any given topic. The tweets which include hashtags(#) related to NFTs are collected.
This dataset can possibly help to capture the trend of NFTs by using available data and answerquestions that help understand how far NFTs have come.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.
Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.
"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.
I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."
This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'
The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot
I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F9ff49a3bb052e339eb85a66dca611f6c%2Fcharlie-kirk-turning-point2-91025-91025-a19b6183557949938f0dc01df2c33a28.jpg?generation=1757731111497297&alt=media" alt="">
Charles James Kirk (October 14, 1993 – September 10, 2025) was an American conservative political activist, author, and media personality. He co-founded the organization Turning Point USA (TPUSA) in 2012 and was its executive director. He was the chief executive officer of Turning Point Action (TPAction) and a member of the Council for National Policy (CNP). In his later years, he was one of the most prominent voices of the populist MAGA movement and exemplified the growth of Christian nationalism in the Republican Party.
From: https://en.wikipedia.org/wiki/Charlie_Kirk
https://www.youtube.com/watch?v=0xngCgJnO5E" alt="">
On September 10, 2025, while on stage at Utah Valley University in Orem, Utah, for a TPUSA event, "The American Comeback Tour", Kirk was fatally shot in the neck. The shooting took place at 12:23 p.m. MDT (18:23 UTC), around 20 minutes after the event began, in front of an audience of about 3,000 people.
From: https://en.wikipedia.org/wiki/Charlie_Kirk
I added a file to denote users who have posted tweets about the topic that have either characteristic(s) - Blue-certified accounts with at least 10K followers - Non-Blue-certified accounts with at least 50K followers
This is to help map back and include additional context on who these users that are being tagged or are creating the tweets
I signed up for a trial with https://twitterapi.io/ , check it out!
Credit : OLIVIER TOURON/ AFP via Getty
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Databases of highly networked individuals have been indispensable in studying narratives and influence on social media. To support studies on Twitter in India, we present a systematically categorized database of accounts of influence on Twitter in India, identified and annotated through an iterative process of friends, networks, and self-described profile information, verified manually. We built an initial set of accounts based on the friend network of a seed set of accounts based on real-world renown in various fields, and then snowballed friends of friends" multiple times, and rank ordered individuals based on the number of in-group connections, and overall followers. We then manually classified identified accounts under the categories of entertainment, sports, business, government, institutions, journalism, civil society accounts that have independent standing outside of social media, as well as a category ofdigital first" referring to accounts that derive their primary influence from online activity. Overall, we annotated 11580 unique accounts across all categories. The database is useful studying various questions related to the role of influencers in polarisation, misinformation, extreme speech, political discourse etc.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The "US Congressional Tweets Dataset" is a comprehensive collection of tweets from US Congressional members spanning from 2008 to 2017. This dataset is valuable for organizations like Lobbyists4America, which aims to gain insights into legislative trends and influences for effective lobbying strategies. The dataset is structured into two primary components: users_df and tweets_df.
users_df: This DataFrame provides detailed information about the Twitter accounts of various congressional members. It includes a range of attributes such as:
created_at), follower and friend counts (followers_count, friends_count).contributors_enabled, default_profile, is_translator, etc.tweets_df: This DataFrame contains the actual tweet data from these congressional accounts. Key columns include:
created_at: The timestamp of the tweet.favorite_count and retweet_count: Indicators of the tweet's popularity.text: The text content of the tweet.user_id, lang (language), and source (device/app used for tweeting).possibly_sensitive, quoted_status_id, and engagement-related fields.The dataset is utilized for various analyses, including:
Network Analysis: Exploring the connections and interactions between different congressional members on Twitter, potentially revealing influential figures or groups within Congress.
Sentiment Analysis: Using libraries like TextBlob and NLTK, this analysis assesses the sentiment (positive, negative, neutral) of tweets to understand the general tone and stance of congressional members on various issues.
Correlation Analysis: Investigating relationships between different numerical features in the dataset, such as whether higher tweet frequencies correlate with more followers.
Word Clustering/Topic Modeling: Utilizing NMF (Non-Negative Matrix Factorization) from scikit-learn to cluster words and identify major themes or topics discussed in the tweets.
Time Series Analysis: Observing trends and patterns in tweeting behavior over time, such as increased activity around elections or significant political events.
The "US Congressional Tweets Dataset" is a rich source for analyzing the digital footprint of US Congressional members. Through the application of various data science techniques, Lobbyists4America can extract meaningful insights about political sentiments, networking patterns, and topical trends among lawmakers. This information is crucial for tailoring lobbying efforts and understanding the legislative landscape.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Readme file for ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health Generated on 2021-02-15Recommended citation for the dataset:P. Surana, M. Yusuf and S. Singh, "Severity Classification of Mental Health-Related Tweets," 2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), 2021, pp. 336-341, DOI: 10.1109/DISCOVER52564.2021.9663651.******************************PROJECT INFORMATION******************************1. Title of dataset: Mental Health Dataset2. Author information:Praatibh Surana, Manipal Institute of Technology,Mirza Yusuf, Manipal Institute of Technology,Sanjay Singh, Manipal Institute of TechnologyPrincipal Investigators Name: Praatibh SuranaAddress: Manipal Institute of TechnologyEmail: praatibhsurana@gmail.comName: Mirza YusufAddress: Manipal Institute of TechnologyEmail: baig.yusuf.cr7@gmail.comCo-InvestigatorName: Sanjay SinghAddress: Manipal Institute of TechnologyEmail: sanjay.singh@manipal.edu3. Date of data collection: Jan 2021 - Feb 2021************************************DATA ACCESS INFORMATION************************************1. Licences/restrictions placed on access to the dataset: CC BY 4.02. Links to publications that use the data:URL: https://ieeexplore.ieee.org/document/9663651,DOI: 10.1109/DISCOVER52564.2021.96636513. Links to a third party or secondary data used in the project (for example, existing datasets, third-party datasets)Pennington, Jeffrey et al. “GloVe: Global Vectors for Word Representation.” EMNLP (2014).DOI: https://doi.org/10.3115/v1/d14-1162*****************************************METHODS OF DATA COLLECTION*****************************************1. Describe the methods for data collection and/or provide links to papers describing data collection methodsPaper DOI :Our research revolved around correctly classifying tweets based on their severity in a mental health context. An effort was also made to make the models detect sarcasm better, as this was something that many models in the past failed to do. Our dataset consists of tweets labeled as ‘0’, ‘1’, and '2' depending on their classes. The labeling rules followed are given in Table 1TABLE 1 - SEVERITY CLASSIFICATION CLASSES AND EXAMPLES-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Class | Rules | Example | |0 | Helping / suggestion for mental health awareness | Are you suffering from anxiety? Check out this page for therapy through Skype! | / positivity / informative | | / motivational | | / questions about mental health | | |1 | Sarcasm/rant/expression of annoyance | Today was so annoying. If my teacher would have called my name, I swear to God I would have killed myself | |2 | Case of slight disturbance | All I am is a burden. I don’t want to live anymore. | / strong indication of disturbance | | / user outright mentions depression | | / anxiety / suicide / self-harm |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------The following steps were performed for data collection:1) Tweets were extracted with the help of Twitter’s official API using hashtags such as #depression, #mentalhealth, #anxiety, #selfharm, #killmyself, and #kms from users.2) Around 40,000 tweets were extracted from Twitter between January and February 2021, out of which the final dataset comprised of 2460 tweets; 820 tweets were distributed equally amongst the three classes.3) Two authors manually annotated the dataset and cross-verified it to ensure accurate labeling.2. Data processing methods:A. Preprocessing1) Removal of numbers, URLs, usernames, and special characters: The first step after extraction of the tweets was ensuring that they were suitable for further use. The “preprocessor” uses the Python library to eliminate numbers, retweets, URLs, emojis, emoticons, and usernames, followed by duplicate tweets removal from the dataset.2) Stopword removal and expansion of standard abbreviations: We made use of Python’s “nltk” library for the removal of common stopwords such as “for,” “the,” “a,” etc. As our data is sourced from Twitter, lots of common internet abbreviations like “lol,” “kms,” “gn,”etc., were used. It was taken care of by converting these short forms to their corresponding complete forms. Lots of short forms like “wanna” for “want to” and “gonna” for “going to” were used. We ensured that these, too, were taken care of to the best of our abilities. 3) Removal of names, so that anonymity is maintained. Names of people, places, twitter handles anything that could compromise the anonymity has been removed, a token named as ‘[redacted]’ has been used in their place instead.*******************************SUMMARY OF DATA FILE*******************************Filename: MentalHealthTweets.csvShort description: This CSV File contains 2460 tweets annotated ‘0’, ‘1’ or ‘2’ based on the class they belong to.*******************************************************************DATA-SPECIFIC INFORMATION FOR NOTE: This section should be copied and pasted for each file*******************************************************************1. Number of variables: 22. Number of cases rows: 24613. Missing data codes: NA4. Variable listThe variables and their properties have been provided in Table 2TABLE 2 - VARIABLE DESCRIPTION TABLE----------------------------------------------------------------------Variable Name | Variable Description | Variable Type | |tweets | Cleaned up tweet | String | |label | Annotation for tweet | Integer----------------------------------------------------------------------
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Updated on Sept 9th Includes sent tweets after launch
https://store.storeimages.cdn-apple.com/4668/as-images.apple.com/is/iphone-14-pro-finish-unselect-gallery-1-202209_GEO_EMEA?wid=5120&hei=2880&fmt=p-jpg&qlt=80&.v=1660754213188" alt="Photo by Apple">
Trying to do something useful and add a dataset here in Kaggle, and while there are over 90+ datasets for Elon, there's none yet for tweets about the upcoming iPhone 14. I'm interested in seeing what apple is up to this year, so I thought it could be interesting to deep dive into what people have been saying this month before the release, which was announced today by Apple. It will happen on September 7th.
The dataset has 144k tweets created between July 11th and Sept 9th. Tweets are in English. As the new iPhone was just announced, I plan on updating the dataset to include newer examples and maybe a few older ones to increase the number of samples in the dataset, at least until the first week of launch.
Data was scrapped from Twitter and uploaded as is, no further process to data cleaning was performed, but the data from the tweets are in very good shape. I'd maybe recommend separating data and time and finding a way to change the source from links to the device name or website, depending on what you are interested in using the data for.
Usage suggestions - Data can be used to perform sentiment analysis, look at the geographical distribution, trends, spam x ham identification, and others.
Facebook
TwitterBitcoin(₿) is a cryptocurrency invented in 2008 by an unknown person or group of people using the pseudonym Satoshi Nakamoto. The currency began use in 2009 when its implementation was released as open-source software.
Bitcoin is a blockchain-based decentralized digital currency, without a central bank or single administrator, that can be sent from user to user on the peer-to-peer bitcoin network without the need for intermediaries. Transactions are verified by network nodes through cryptography and recorded in a public distributed ledger called a blockchain. Bitcoins are created as a reward for a process known as mining. They can be exchanged for other currencies, products, and services.
I am sharing the Bitcoin Tweets Dataset to the research community containing large Tweets collected using Trackmyhashtag. The dataset consists of a total of 337,701 tweet IDs of the same number of tweets about bitcoin that were posted on Twitter from 15th Sept 2022 to 17th Sept 2022.
The dataset was collected using Trackmyhashtag, an easy & affordable platform.
A lot of international events that affected bitcoin happened during the collecting time period, which may make this dataset interesting to analyze.
Each Tweet contains different types of data :- - Tweet Id - Tweet URL - Posted time and date - Tweet Content - Other metadata
I hope researchers find it helpful. If you need more datasets, let me know.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description:
The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.
Dataset Breakdown:
Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.
Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.
Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.
Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.
Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.
Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.
Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.
Context and Use Cases:
Researchers, data scientists, and developers can use this dataset to:
Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.
Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.
Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.
Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.
Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.
Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.
The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.
Future Considerations:
As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.
By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...
Facebook
TwitterThis dataset was created by skwolvie
Facebook
TwitterThe purpose of this agreement is for SSA to verify SSN information for the Office of Personnel Management. OPM will use the SSN verifications in its investigative process to conduct background investigations of members of the military, Federal employees, applicants for Federal employment, and contractors affiliated with Federal agencies.
Facebook
TwitterSocial media companies are starting to offer users the option to subscribe to their platforms in exchange for monthly fees. Until recently, social media has been predominantly free to use, with tech companies relying on advertising as their main revenue generator. However, advertising revenues have been dropping following the COVID-induced boom. As of July 2023, Meta Verified is the most costly of the subscription services, setting users back almost 15 U.S. dollars per month on iOS or Android. Twitter Blue costs between eight and 11 U.S. dollars per month and ensures users will receive the blue check mark, and have the ability to edit tweets and have NFT profile pictures. Snapchat+, drawing in four million users as of the second quarter of 2023, boasts a Story re-watch function, custom app icons, and a Snapchat+ badge.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a subset of the larger (🌇Sunset) 🇺🇦 Ukraine Conflict Twitter Dataset, available on Kaggle, focusing specifically on tweets related to the ongoing conflict between Ukraine and Russia.
Data Source: The data was originally collected from the Twitter API by Creator. The dataset contains tweets spanning a significant timeframe, capturing public sentiment, news updates, and discussions related to the conflict between Ukraine and Russia.
File Size and Format: Given the extensive size of the original dataset (approximately 48GB), we've extracted and curated a smaller subset of approximately 4GB, focusing specifically on tweets relevant to the Ukraine conflict. The files have been renamed for ease of access and loading, making them more manageable for analysis and exploration.
Usage: Researchers, data scientists, and analysts interested in studying the discourse surrounding the Ukraine-Russia conflict, social media sentiment analysis, or geopolitical dynamics may find this dataset particularly valuable. It can be used for tasks such as sentiment analysis, topic modeling, trend analysis, and understanding public perceptions and reactions to unfolding events.
Disclaimer: While efforts have been made to ensure the accuracy and relevance of the data, users are encouraged to exercise caution and verify the information as Twitter data can be subject to biases, noise, and misinformation. Additionally, please adhere to Twitter's terms of service and guidelines when using this dataset for research or analysis purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coursera was last valued in the private market at $3.6 billion, according to PitchBook.
Founded in 2012 by former Stanford University computer science professors Daphne Koller and Andrew Ng, the Mountain View, California-based company offers individuals access to online courses and degrees from top universities, a business that has boomed throughout the Covid-19 pandemic.
Revenue last year jumped 59% to $293 million. Still, Coursera’s net losses widened to $66.8 million from $46.7 million in 2019 as the company said it added over 12,000 new degrees for students over the last two years. Total registered users grew 65% year over year in 2020.
Source : https://www.cnbc.com/2021/03/31/coursera-ipo-cour-begins-trading-on-the-nyse.html
The dataset contains tweets regarding Coursera IPO from verified twitter accounts
Coursera IPO Tweets are scraped using Twint.
Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Continuing on this series of datasets of tweets scrapped from Twitter, this dataset contains 80k tweets where the user mentions "crypto". This is a very popular topic with 80 k tweets in English being sent in 2 days, between Aug 28 and 29 2022.
https://images.unsplash.com/photo-1631603090989-93f9ef6f9d80?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1744&q=80" alt="Photo from Unsplash">
Data was scrapped from Twitter and uploaded as is, no further process to data cleaning was performed, but the data from the tweets are in very good shape. I'd maybe recommend separating data and time and finding a way to change the source from links to the device name or website, depending on what you are interested in using the data for.
Usage suggestions - Data can be used to perform sentiment analysis, look at the geographical distribution, trends, spam x ham identification, and others.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is part of a project focused on detecting fake news and misleading content in Egyptian Arabic text from Twitter and Facebook. It contains 22,906 labeled text samples, with labels representing:
f → Fake or misleading content
r → Real or factual content
idk → Unclear or ambiguous content
🔍 Sources & Labeling The dataset is based on manually labeled samples and semi-supervised labeling using an XGBoost classifier trained on a small seed set. Over 20,000 examples were confidently pseudo-labeled using probability thresholds.
The original texts are in Arabic, with content reflecting real social media discourse in Egypt, making this dataset particularly useful for research on:
Arabic NLP
Fake news detection
Misinformation studies
Social media analysis
🧠 Applications This dataset can be used for training and evaluating:
Text classification models
Fake news detectors
Sentiment analysis pipelines
Arabic language models
📌 Notes The dataset will be continuously refined, and future updates will include more manually verified labels. Please cite appropriately and reach out if using it in academic work.
Facebook
TwitterWarning!! this data won't be updated while the private leaderboard is calculated! If you use it in your solution, you are guaranteed to overfit!
This is a bonus dataset to be used on the G-Research crypto forecasting competition containing the most powerful features for predicting cryptocurrencies movement: Elon Musk's Twitter 😂
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of all the tweets twitted by @elonmusk or mentioning @elonmusk, both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Twitter are collected, saved, and processed.
Content Over 13,000 tweets collected using Twitter API keyword search between 01.01.2010 and today. Columns are as follows:
user_name: The user name of the author.
user_location: Location of the author.
user_description: The 'description' on the author's profile.
user_created: When was the user created.
user_followers: Number of followers the user has.
user_friends: Number of friends the user has.
user_favourites: Number of favourites the user has.
user_verified: Is this use verified?
date: Date the tweet was tweeted.
text: Content of the tweet
The dataframe is indexed by date and sorted from oldest to newest.
The first row starts at 01.01.2010 and the last one if of the time associated with the most recent run of the collector. [Hopefuly today]
Thanks to Twitter for providing the free API.
Elon Musk: https://en.wikipedia.org/wiki/Elon_Musk Twitter: https://twitter.com Elon Musk on Twitter: https://twitter.com/elonmusk
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
COVID-19 is an infectious disease caused by a newly discovered strain of coronavirus, a type of virus known to cause respiratory infections in humans. This new strain was unknown before December 2019, when an outbreak of pneumonia of unidentified cause emerged in Wuhan, China.
Ever since the Covid-19 pandemic there has been quite a buzz in social media platforms and news sites regarding the need for COVID-19 Vaccine. As the number of people getting affected by Covid-19 has been increasing drastically. This data set brings you the twitter tweets made with the hashtag #CovidVaccine
The tweets have #CovidVaccine hashtag. The collection started on 1/8/2020, and will be updated on a daily basis.
The data totally consists of 1 lakh+ records with 13 columns. The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers a account currently has. | | 6 | user_friends | The number of friends a account currently has. | | 7 | user_favourites | The number of favorites a account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #CovidVaccine | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
You can use this data to dive into the subjects that use this hashtag, look to the geographical distribution, evaluate sentiments, looks to trends.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Tweets about the massive earthquake in Turkey.
This dataset is collected daily using tweepy and Twitter API. The source of the dataset is: public tweets about the massive earthquake in Turkey.
The following columns are included:
You can use this dataset to follow the trends about the news related to this unfortunate event. Use your NLP and data analysis skills to extract relevant information.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset merges three distinct data sources to explore the relationship between COVID-19 death rates, vaccination efforts, and public sentiment on Twitter from December 25, 2020 to March 29, 2022. It includes 2,000 cleaned rows with 16 variables, created by combining global health statistics and social media sentiment data.
COVID-19 Deaths Data (scraped from Worldometer - COVID-19 Deaths via BeautifulSoup):
Date: Date of recorddaily_increase_percent: % change in deaths from previous daySeason: Derived from date (Winter, Spring, Summer, Fall)Tweet Sentiment Data : COVID Vaccine Tweets Dataset
Date: Tweet timestamptext_sentiment: Sentiment label (positive, neutral, negative) from NLTK’s SentimentIntensityAnalyzeruser_verified: Whether the user is verifieduser_since_days: Age of the Twitter account (in days)country: Cleaned user locationVaccination Data : Vaccination Dataset
Date: Date of recordtotal_vaccinations_per_hundred: Doses per 100 peopledaily_vaccinations: Daily dose countvaccine_group: Grouped vaccine type (e.g., mRNA, Viral Vector)country: Country nameDate and countrySeason, user_since_days, vaccine_groupThis dataset was used in a final data science project to:
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Non-Fungible Tokens (NFTs) are a relatively new concept and have been making headlines for the related events happening in the space.
The best way to gauge the sentiments and get basic level stats is to use data from social media. Twitter is a powerful platform for people to express their opinions on any given topic. The tweets which include hashtags(#) related to NFTs are collected.
This dataset can possibly help to capture the trend of NFTs by using available data and answerquestions that help understand how far NFTs have come.