Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides an extensive analysis of Twitter retweet activities, focusing on various attributes that can influence and describe the nature of retweets. It consists of multiple rows of data, each representing a unique Twitter retweet instance with detailed information on its characteristics.
Weekday: The day of the week when the retweet occurred.
Hour: The hour of the day when the retweet was made, in 24-hour format.
Day: The day of the month when the retweet was posted.
Lang: The language code of the tweet that was retweeted.
Reach: The estimated number of users who have seen the retweet.
RetweetCount: The number of times the retweeted tweet has been retweeted further.
Likes: The number of likes received by the retweeted tweet.
Klout: The Klout score of the user who posted the original tweet, which is a measure of their influence on social media.
Sentiment: The sentiment score of the retweeted tweet, indicating the overall emotional tone.
LocationID: A numerical identifier representing the geographical location of the user who posted the retweet.
This dataset can be utilized for various analyses, including: - Identifying peak times for retweets - Analyzing the influence of tweet attributes on retweet rates - Sentiment analysis of popular retweets - Geographical distribution of retweet activity - Correlating Klout scores with retweet reach and engagement
Researchers, marketers, and social media analysts can use this dataset to gain insights into Twitter retweet behavior, optimize social media strategies, and understand the factors contributing to the virality of tweets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Advertising makes up 89% of its total revenue and data licensing makes up about 11%.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
"Unleashing Social Sentiments: A Twitter Analysis" appears to be a study or analysis that uses a Twitter dataset to explore the sentiment and opinions of Twitter users towards a particular topic or set of topics. Without more information about the study, it is difficult to provide a detailed analysis. However, based on the title and the use of a Twitter dataset, it is likely that the study involves the use of sentiment analysis techniques to analyze the opinions and sentiment expressed in the dataset.
https://camo.githubusercontent.com/7bf6f8c804cf1ec62e2cbbc7c85ea7dfd65b4848df48be4218e24012c6eb3430/68747470733a2f2f692e6d6f72696f682e636f6d2f323032302f30322f30342f6265656633366664373037642e6a7067">
The use of Twitter data for sentiment analysis has become increasingly popular in recent years due to the massive volume of data available and the ease with which opinions and sentiment can be expressed on the platform. By analyzing Twitter data, researchers can gain insights into public opinion and sentiment on a wide range of topics, from politics to consumer products to social issues.
To conduct a Twitter analysis, researchers typically collect a dataset of tweets related to a particular topic or set of topics. This dataset may include features such as the Twitter username, the tweet content, the time and date of the tweet, and any associated metadata such as hashtags or mentions. The dataset can then be processed using NLP or sentiment analysis techniques to classify the sentiment expressed in each tweet as positive, negative, or neutral.
The dataset contains tweets from the Twitter API that were scraped for seven hashtags:
#Messi: This hashtag refers to the Argentine soccer superstar Lionel Messi, and is commonly used by fans and followers to discuss his performances, accomplishments, and news related to his career.
#FIFAWorldCup: This hashtag is used during the FIFA World Cup, a quadrennial international soccer tournament. Tweets with this hashtag may discuss news, scores, or analysis related to the tournament.
#DeleteFacebook: This hashtag is used by people who advocate for deleting or boycotting Facebook, often in response to controversies related to data privacy, political advertising, or other issues related to the social media giant.
#MeToo: This hashtag is used in the context of the Me Too movement, a social movement against sexual harassment and assault, particularly in the workplace. Tweets with this hashtag may share personal stories, express support for the movement, or discuss related news and events.
#BlackLivesMatter: This hashtag is used in the context of the Black Lives Matter movement, a movement against police brutality and systemic racism towards Black people. Tweets with this hashtag may express support for the movement, share news and updates, or discuss related issues.
#NeverAgain: This hashtag is used in the context of the Never Again movement, which advocates for gun control and other measures to prevent school shootings and other acts of gun violence.
#BarCamp: This hashtag refers to BarCamp, an international network of unconferences - participant-driven conferences that are open and free to attend. Tweets with this hashtag may discuss upcoming BarCamp events, share insights or learnings from past events, or express support for the BarCamp community.
The sentiment score was generated using a pre-trained sentiment analysis model, and represents the overall sentiment of the tweet (positive, negative, or neutral).
The data can be used to gain insights into how people are discussing and reacting to these topics on Twitter, and how the sentiment towards these hashtags may have evolved over time. Researchers and analysts can use this dataset for sentiment analysis, natural language processing, and machine learning applications.
Some potential analyses that can be performed on the data include sentiment trend analysis over time, geographical distribution of sentiments, and topic modeling to identify themes and topics that emerge from the tweets.
Overall, the dataset provides a rich resource for researchers and analysts interested in studying social and political issues on social media.
Facebook
TwitterThis dataset contains tweets and embeddings of the top 1000 Twitter celebrity accounts
Tweets -
Embeddings -
NB: - There are almost 10% of the Twitter accounts were private, changed their username, or suspended. In the end, the number of users remains 915. - There are some unofficial Celebrity accounts (ex - twitter.com/sonunigam) with a very small amount of tweets. We can filter those users based on their tweet count. Here is a good research paper on this topic - 25 Tweets to Know You: A New Model to Predict Personality with Social Media
kaggle API Command
!kaggle datasets download -d ahmedshahriarsakib/top-1000-twitter-celebrity-tweets-embeddings
The tweets which were scraped are all publicly available and it's intended for educational purposes only.
Cover image credit - bestfunquiz- Which Celebrity On Twitter Should Follow You
Facebook
TwitterBy Krystal Jensen [source]
The dataset Twitter Data: Tweets and User Interactions provides comprehensive information about tweets and user interactions on the popular social media platform Twitter. The dataset includes various attributes that shed light on the characteristics and engagement metrics of tweets, allowing for in-depth analysis of user behavior and content performance.
One of the key variables in this dataset is the Klout score, which represents the influence and reputation of the Twitter users who posted the tweets. This numeric metric helps assess the impact a user has on their audience and provides insights into their social media presence.
Another essential attribute is the text content of each tweet. By examining this textual data, analysts can uncover valuable information about trending topics, opinions, sentiments, conversations, or news shared by users. It serves as a primary source for understanding what people share publicly on Twitter.
The dataset Twitter+data+in+sheets.csv serves as a reliable resource for conducting research or performing analytics that require detailed information about Twitter activity. It covers aspects such as tweet characteristics (including length and language), engagement metrics (such as retweets and favorites), sentiment analysis (revealing positive or negative emotions expressed), as well as individual user details.
By utilizing this extensive dataset, researchers can gain valuable insights into patterns of online communication within Twitter's vast network. They can identify influential individuals with high Klout scores who have substantial reach among their followers or communities. Additionally, they can analyze various aspects related to tweet content such as sentiment analysis to understand public opinion trends or measure engagement levels through counts like retweets and favorites.
Overall, this dataset serves as an invaluable resource for anyone interested in comprehensively analyzing tweets' characteristics, exploring how users interact with them across different dimensions like popularity or sentiment analysis groups—or examining correlations between Klout scores with other factors influencing engagement levels like time posted
Welcome to the Twitter Data: Tweets and User Interactions dataset! This dataset provides valuable insights into tweet characteristics and user engagement on Twitter. Here is a useful guide on how to make the most out of this dataset:
Understanding the Columns: There are two main columns in this dataset:
- Klout Score (Numeric): The Klout score indicates the influence of the user who posted the tweet. A higher Klout score suggests greater influence and reach.
- Text Content of Tweet (Text): This column contains the actual text content of each tweet.
Analyzing Tweet Characteristics: The text content column will help you understand various aspects of tweets, such as language, sentiment, trending topics, or specific keywords used by users. You can perform text analysis techniques like word frequency analysis or sentiment analysis to gain insights into tweet characteristics.
Examining User Engagement: The Klout score provides a measure of user influence on Twitter. By analyzing this column, you can identify highly influential users who generate higher engagement rates with their tweets. You can further explore interactions (likes, retweets, replies) between these influential users and other Twitter users mentioned in their tweets.
Identifying Trends and Patterns: With this dataset's rich information about tweet content and user engagement, you can identify popular trends or patterns among highly engaged tweets or influential users over different time periods.
Remember that dates are not included in this guide since they were not provided in the original request for creating it.
Please note that it is essential to responsibly use this data for any analysis or research purposes while adhering to ethical considerations related to privacy rights and data usage policies set by both Kaggle platform rules as well as any relevant privacy regulations.
Best regards, [Your Name]
- Analyzing the relationship between Klout score and the content of tweets: This dataset can be used to investigate whether there is a correlation between a user's Klout score (a measure of their social media influence) and the characteristics of their tweets. By examining factors such as tweet length, sentiment, and engagement metrics, researchers can gain...
Facebook
TwitterThe number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Twitter [source]
This dataset provides a unique opportunity to unravel the intricacies of a conversational exchange on social media platforms, by exploring the complex interplay between retweets, likes, mentions and replies. Greekgodx is an immensely popular Twitch streamer and YouTuber, whose tweets offer invaluable insights into how people interact with each other on social media networks. Through this data set we can gain an understanding of user engagement levels, the influence of certain topics or interests on conversations, as well as explore new techniques for measuring sentiment in social media conversations. With these tools in hand we will be better equipped to interpret popular conversations occurring online and more confidently make decisions based upon insights gleaned from our analysis
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to use this dataset
This dataset is a useful resource for those wanting to explore and analyze the conversational dynamics that occur on social media platforms. It includes tweets from popular Twitch streamer and YouTuber, Greekgodx, whose content often inspires engagement from his followers as well as other online users. Here you will find various columns that provide an opportunity to investigate this data in a number of ways, such as investigating any retweets or likes he receives in response to his tweets or the mentions he gets from other users.
The data included here consists of four columns: id, tweet_text, timestamp, retweets_count, likes_count and mentions. All of these features help you gain insights into different elements of interaction between Greekgodx and other Twitter users by providing information about when particular tweets were published (timestamp), how many people have engaged with them (retweets count/likes count) or what kind of people are talking about him (mentions). Additionally the id column provides an identifier for each tweet which can be used for further analysis if needed.
To effectively work with this data set one could first use basic visualization techniques like histograms or bar plots to identify any initial trends related to how often Greekgodx is retweeted/liked within certain periods of time or which Twitter users mention him more frequently. Additionally more advanced analysis techniques suchas direct network analysis can be used too if one seeks more detailed insights into relationships between different members on the platform – these could suggest which individuals are most influential in terms replicating content posted by Greek god x or who are most active when engaging with him in conversations publicly on Twitter
- Analyzing the Impact of Tweets on Popularity: This dataset can be used to analyze how Greekgodx’s tweets are affecting his popularity and viewership, by looking at engagement metrics such as retweets, likes and mentions over time.
- Exploring Network Dynamics: The dataset can be used to explore the network dynamics of conversations taking place on Twitter, by examining relationships between replies, retweets, likes and mentions over time.
- Investigating Sentiment Analysis of Tweets: This dataset provides a great opportunity to understand sentiment analysis on social media platforms by analyzing the sentiment associated with Greekgodx’s tweets using natural language processing techniques (NLP) and understanding how it affects his engagement levels with followers through retweets, likes, mention etc
If you use this dataset in your research, please credit the original authors. Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Twitter.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.
Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.
"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.
I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."
This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'
The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot
I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
Facebook
TwitterAs of October 2025, social network X (formerly known as Twitter) was most popular in the United States, with an audience reach of approximately 99.04 million users. Japan ranked second, recording more than 71 million users on the platform. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the key Twitter user statistics that you need to know.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.
Facebook
Twitterhttps://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/
In early 2025, something fascinating happened at a small community center in suburban Ohio. A town hall meeting about local road closures suddenly went viral, not because of the topic, but because a 74-year-old attendee live-tweeted the entire event using her iPad. Within hours, her posts racked up thousands of...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.
Facebook
TwitterBetween July and December 2024, over 335 million accounts on X (formerly Twitter) were suspended for reasons of spam or platform manipulation. User-informed labels were added to 66 million posts after being reported for spam.
Facebook
TwitterPlease cite the following paper when using this dataset: N. Thakur, “Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions,” Preprints, 2022, DOI: 10.20944/preprints202206.0383.v1 Abstract The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and use cases in assisted living, military, healthcare, firefighting, and industries. With the projected increase in the diverse uses of exoskeletons in the next few years in these application domains and beyond, it is crucial to study, interpret, and analyze user perspectives, public opinion, reviews, and feedback related to exoskeletons, for which a dataset is necessary. The Internet of Everything era of today's living, characterized by people spending more time on the Internet than ever before, holds the potential for developing such a dataset by mining relevant web behavior data from social media communications, which have increased exponentially in the last few years. Twitter, one such social media platform, is highly popular amongst all age groups, who communicate on diverse topics including but not limited to news, current events, politics, emerging technologies, family, relationships, and career opportunities, via tweets, while sharing their views, opinions, perspectives, and feedback towards the same. Therefore, this work presents a dataset of about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. Instructions: This dataset contains about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. The dataset contains only tweet identifiers (Tweet IDs) due to the terms and conditions of Twitter to re-distribute Twitter data only for research purposes. They need to be hydrated to be used. The process of retrieving a tweet's complete information (such as the text of the tweet, username, user ID, date and time, etc.) using its ID is known as the hydration of a tweet ID. The Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) or any similar application may be used for hydrating this dataset. Data Description This dataset consists of 7 .txt files. The following shows the number of Tweet IDs and the date range (of the associated tweets) in each of these files. Filename: Exoskeleton_TweetIDs_Set1.txt (Number of Tweet IDs – 22945, Date Range of Tweets - July 20, 2021 – May 21, 2022) Filename: Exoskeleton_TweetIDs_Set2.txt (Number of Tweet IDs – 19416, Date Range of Tweets - Dec 1, 2020 – July 19, 2021) Filename: Exoskeleton_TweetIDs_Set3.txt (Number of Tweet IDs – 16673, Date Range of Tweets - April 29, 2020 - Nov 30, 2020) Filename: Exoskeleton_TweetIDs_Set4.txt (Number of Tweet IDs – 16208, Date Range of Tweets - Oct 5, 2019 - Apr 28, 2020) Filename: Exoskeleton_TweetIDs_Set5.txt (Number of Tweet IDs – 17983, Date Range of Tweets - Feb 13, 2019 - Oct 4, 2019) Filename: Exoskeleton_TweetIDs_Set6.txt (Number of Tweet IDs – 34009, Date Range of Tweets - Nov 9, 2017 - Feb 12, 2019) Filename: Exoskeleton_TweetIDs_Set7.txt (Number of Tweet IDs – 11351, Date Range of Tweets - May 21, 2017 - Nov 8, 2017) Here, the last date for May is May 21 as it was the most recent date at the time of data collection. The dataset would be updated soon to incorporate more recent tweets.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
-This Dataset was gathered by crawling Twitter's REST API using the Python library tweepy 3. This dataset contains the tweets of the 20 most popular twitter users (with the most followers) whereby retweets are neglected. These accounts belong to public people, such as Katy Perry and Barack Obama, platforms, YouTube, Instagram, and television channels shows, e.g., CNN Breaking News and The Ellen Show. -Consequently, the dataset contains a mix of relatively structured tweets, tweets written in a formal and informative manner, and completely unstructured tweets written in a colloquial style. Unfortunately, the geocoordinates were not available for those tweets. - H -This Dataset has been used to generate reserach paper under title "Machine Learning Techniques for Anomalies Detection in Post Arrays". -Crawled attributes are: Author (Twitter User), Content (Tweet), Date_Time, id (Twitter User ID), language (Tweet Langugage), Number_of_Likes, Number_of_Shares. Overall: 52543 tweets of top 20 users in twitter Screen_Name #Tweets Time span (in days) TheEllenShow 3,147 - 662 jimmyfallon 3,123 - 1231 ArianaGrande 3,104 - 613 YouTube 3,077 - 411 KimKardashian 2,939 - 603 katyperry 2,924 - 1,598 selenagomez 2,913 - 2,266 rihanna 2,877 - 1,557 BarackObama 2,863 - 849 britneyspears 2,776 - 1,548 instagram 2,577 - 456 shakira 2,530 - 1,850 Cristiano 2,507 - 2,407 jtimberlake 2,478 - 2,491 ladygaga 2,329 - 894 Twitter 2,290 - 2,593 ddlovato 2,217 - 741 taylorswift13 2,029 - 2,091 justinbieber 2,000 - 664 cnnbrk 1,842 - 183
Facebook
Twitterhttp://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
This dataset comprises a set of Twitter accounts in Singapore that are used for social bot profiling research conducted by the Living Analytics Research Centre (LARC) at Singapore Management University (SMU). Here a bot is defined as a Twitter account that generates contents and/or interacts with other users automatically (at least according to human judgment). In this research, Twitter bots have been categorized into three major types:
Broadcast bot. This bot aims at disseminating information to general audience by providing, e.g., benign links to news, blogs or sites. Such bot is often managed by an organization or a group of people (e.g., bloggers). Consumption bot. The main purpose of this bot is to aggregate contents from various sources and/or provide update services (e.g., horoscope reading, weather update) for personal consumption or use. Spam bot. This type of bots posts malicious contents (e.g., to trick people by hijacking certain account or redirecting them to malicious sites), or promotes harmless but invalid/irrelevant contents aggressively.
This categorization is general enough to cater for new, emerging types of bot (e.g., chatbots can be viewed as a special type of broadcast bots). The dataset was collected from 1 January to 30 April 2014 via the Twitter REST and streaming APIs. Starting from popular seed users (i.e., users having many followers), their follow, retweet, and user mention links were crawled. The data collection proceeds by adding those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. Using this procedure, a total of 159,724 accounts have been collected. To identify bots, the first step is to check active accounts who tweeted at least 15 times within the month of April 2014. These accounts were then manually checked and labelled, of which 589 bots were found. As many more human users are expected in the Twitter population, the remaining accounts were randomly sampled and manually checked. With this, 1,024 human accounts were identified. In total, this results in 1,613 labelled accounts. Related Publication: R. J. Oentaryo, A. Murdopo, P. K. Prasetyo, and E.-P. Lim. (2016). On profiling bots in social media. Proceedings of the International Conference on Social Informatics (SocInfo’16), 92-109. Bellevue, WA. https://doi.org/10.1007/978-3-319-47880-7_6
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The platform is male-dominated with 68.1% of all Twitter users being male. Just 31.9% of Twitter users are female.
Facebook
TwitterAlthough counts of tweets citing academic papers are used as an informal indicator of interest, little is known about who tweets academic papers and who uses Twitter to find scholarly information. Without knowing this, it is difficult to draw useful conclusions from a publication being frequently tweeted. This study surveyed 1,912 users that have tweeted journal articles to ask about their scholarly-related Twitter uses. Almost half of the respondents (45%) did not work in academia, despite the sample probably being biased towards academics. Twitter was used most by people with a social science or humanities background. People tend to leverage social ties on Twitter to find information rather than searching for relevant tweets. Twitter is used in academia to acquire and share real-time information and to develop connections with others. Motivations for using Twitter vary by discipline, occupation, and employment sector, but not much by gender. These factors also influence the sharing of different types of academic information. This study provides evidence that Twitter plays a significant role in the discovery of scholarly information and cross-disciplinary knowledge spreading. Most importantly, the large numbers of non-academic users support the claims of those using tweet counts as evidence for the non-academic impacts of scholarly research.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv