Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These Twitter user statistics will give you the complete story of where Twitter is at today and what the future looks like for the social media company.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Víctor Yeste. Universitat Politècnica de Valencia.This work is an exploratory, quantitative, and not experimental study with an inductive inference type and a longitudinal follow-up. It analyzes movie data and tweets published by users using the official Twitter hashtags of movie premieres the week before, the same week, and the week after each release date.The scope of the study is the collection of movies released in February 2022 in the USA, and the object of the study includes them and the tweets that refer to the film in the 3 closest weeks to their premiere dates. The tweets recollected were classified by the week they were published, so they are classified by a time dimension called timepoint. The week before the release date has been designated as timepoint 1, the week of the release date is timepoint 2, and the week immediately afterward is timepoint 3. Another dimension that has been considered is if the movie has domestic production or not, which means that if one of the countries of origin is the United States, the movie is designated as domestic.The chosen variables are organized in two data tables, one for the movies and one for the collected tweets.Variables related to the movies:id: Internal id of the moviename: Title of the moviehashtag: Official hashtag of the moviecountries: List of countries of the movie, separated by a semicolonmpaa: Film ratings system by the Motion Picture Association of America. It is a completely voluntary rating system and ratings have no legal standing. The currently rating systems include G (general audiences), PG (parental guidance suggested), PG-13 (parents strongly cautioned), R (restricted, under 17 requires accompanying parent or adult guardian) and NC-17 (no one 17 and under admitted)(Film Ratings - Motion Picture Association, n.d.)genres: List of genres of the movie, e.g., Action or Thriller, separated by a semicolonrelease_date: Release date of the movie in a format YYYY-MM-DDopening_grosses: Amount of USA dollars that the movie obtained on the opening date (the first week after the release date)opening_theaters: Amount of USA theaters that released the movie on the opening date (the first week after the release date)rating_avg: Average rating of the movieVariables related to the tweets:id: Internal id of the tweetstatus_id: Twitter id of the tweetmovie_id: Internal id of the movietimepoint: Week number related to the movie premiere that the tweet was published on. “1” is the week before the movie release, “2” is the week after the movie release” and “3” is the second week after the movie release.author_id: Twitter id of the author of the tweetcreated_at: Date and time of the tweet, with format “YYYY-MM-DD HH:MM:SS”quote_count: Number of the tweet’s quotesreply_count: Number of the tweet’s repliesretweet_count: Number of the tweet’s retweetslike_count: Number of the tweet’s likessentiment: Sentiment analysis of the tweet’s content with a range from -1 (negative) to 1 (positive)This dataset has contributed to the elaboration of the book chapters:Yeste, Víctor; Calduch-Losa, Ángeles (2022). Genre classification of movie releases in the USA: Exploring data with Twitter hashtags. In Narrativas emergentes para la comunicación digital (pp. 1012-1044). Dykinson, S. L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). Exploratory Twitter hashtag analysis of movie premieres in the USA. In Desafíos audiovisuales de la tecnología y los contenidos en la cultura digital (pp. 169-187). McGraw-Hill Interamericana de España S.L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). ANOVA to study movie premieres in the USA and online conversation on Twitter. The case of rating average using data from official Twitter hashtags. In El mapa y la brújula. Navegando por las metodologías de investigación en comunicación (pp. 151-168). Editorial Fragua.
As of February 2025, 37.5 percent of X’s (formerly Twitter) global audience was aged between 25 and 34 years. The second-largest age group demographic on the platform was represented by users aged between 18 and 24 years, with a share of 32.1 percent. Users aged less than 18 years accounted for two percent of users, while those aged 50 or older accounted for roughly 7.3 percent. X is a male-dominated platform As of January 2024, more than 60 percent of X users were male. Although all mainstream social media platforms tend to have a slightly more male-skewing audience, X stands out above Instagram, Snapchat, TikTok, and Facebook when it comes to user gender demographics. Overall, Pinterest is the only mainstream platform to have a higher share of female users. X Blue for you It is not uncommon for social media users to now have the chance to become subscribers of their chosen online networks for a monthly fee. X Blue is a subscription service from X that gives users special benefits and features. A blue verification mark, edit post functionality, fewer ads, priority ranking in chats, and longer video upload times are some of the perks offered.
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
This dataset consists of a few thousand Twitter user reviews (input text) and Emotions (output labels ) for learning how to train the text for emotion analysis. This dataset was created using Twitter API by implementing the Keywords. The idea here is a dataset is more than a toy - real business data on a reasonable scale - but can be trained in minutes on a modest laptop.
This file has Sl no, Tweets, Search key, Feeling.
Description of columns in the file:
Tweets - text of the review Search key - Keyword used Feeling - Emotion classified using the keyword, this column contains 6 emotions i.e., Happy, Sad, Surprise, Fear, Disgust, Angry
This would be helpful for the organization to understand Customer reviews/feedbacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the key Twitter user statistics that you need to know.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The purpose of this data collection was to test a scale for detecting verbal violence in Tweets. Workers at Mechanical Turk were first asked to complete a qualification test and then invited to code additional Tweets according to our scale. The qualification test involved a detailed explanation of each item of the scale, a walkthrough of a tweet that we had coded according to all 14 scale-items, a practice exercise, and a test. In the practice exercise, potential coders attempted to code a tweet on their own using our scale. After submitting their ratings, they were shown our own ratings for the same tweet and explanations for each of our ratings. The test component consisted of another coding task, in which coders were asked to code another tweet that we had already coded ourselves. The workers who, on test, with our ratings of that tweet on at least 11 out of the 14 items “passed” the test, earning the qualification that allowed them to participate in future coding tasks. Variables in the data include the ID of the Tweet (so that you may find it on Twitter; Twitter Terms of Service prohibit us from sharing the Tweets), the ID number we assigned to the coder, the rating that coder provided for each of the 14 items on our scale, the gender and age of the coder, and any comments the coder provided.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We are sharing a dataset that contains a collection of tweets generated as reactions of the release of 50 different movies. The dataset can be used for gaining useful insights regarding the conversation that is generated around a particular movie. It is particularly suitable for conducting sentiment analysis and other NLP techniques. The dataset contains approximately 2.5 million tweets with their related meta data and cover 50 movies. For each movie, its IMDb rating is included. The movies are the 25 releases with the highest number of votes during 2020 and 2021. The collected tweets represent the reactions of the twitter community during the first week of the release date in US of that particular movie. The tweets per movie ranged from 1.000 to approximately 200.000 tweets with an average of 50.000 per release. We used The Internet Archive Wayback Machine in order to retrieve the IMDb movie rating after one week of the US release date. The tweets and related metadata have been collected using the Tweet Downloader tool. Contact at Tilburg University: Francesco Lelli
Context The objective of this task is to build a model based on pre-processed tweets. For the sake of simplicity, we say a tweet contains a negative review if it has a racist or hate sentiment associated with it. So, the task is to predict the labels on the test dataset after building a model.
Content In the dataset, a labelled train data is given where label '0' denotes the tweet is positive 😊 and label '1' denotes the tweet is negative ☹️
Acknowledgements Dataset is provided by Analytics Vidhya
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Advertising makes up 89% of its total revenue and data licensing makes up about 11%.
Usecase/Applications possible with the data:
Customer feedback analysis: Analyzing customer feedback can be helpful for businesses to keep customers happy, stay loyal to the brand, and identify any areas to improve.
Social media monitoring: With sentiment analysis, companies can monitor what's being said about them on social media and use that to figure out how people feel about their products and services and track any new trends.
Market research: Sentiment analysis can be used to analyze market trends and consumer preferences, which can help companies make informed business decisions and develop effective marketing strategies.
Financial analysis: You can use sentiment analysis to determine what people say about the stock market through news and social media, which can help you make investing decisions.
For e-commerce (amazon/Bestbuy/home depot and much more) following data fields can be included: Title Price Vendor Name Ratings Reviews Brand ASIN URL Sentiment analysis for each review And other fields, as per request
This statistic shows a ranking of the estimated number of Twitter users in 2020 in Asia, differentiated by country. The user numbers have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in more than *** countries and regions worldwide. All input data are sourced from international institutions, national statistical offices, and trade associations. All data has been are processed to generate comparable datasets (see supplementary notes under details for more information).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Francesco Di Serio
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The platform is male-dominated with 68.1% of all Twitter users being male. Just 31.9% of Twitter users are female.
The pornographic star Vicky Vette had the most followed Twitter account in Norway as of January 2021. Over 766 thousand individuals followed her that year. Second in the ranking was the politician and NATO Secretary General Jens Stoltenberg, with 541.7 thousand followers. Third was the Norwegian musician Kygo, who had nearly 474 thousand followers on Twitter. A survey conducted during the third quarter of 2019 found that most respondents (48 percent) reported that the leading purpose of Twitter usage was entertainment.
How often do Norwegians use Twitter?
Another survey from 2019 to 2012 investigated how often Norwegians used Twitter over the year. Overall, the frequency rates were steady. As of the third quarter of 2020, 36 percent of the respondents used the platform monthly or less often, while eight percent reported to have never used it. The share of Norwegians who never used Twitter was lowest in the first quarter of 2019, when it was only five percent. Twitter users’ ages
Twitter was most popular among Norwegians of ages between 18 and 29 in 2020. To be precise, 40 percent of individuals of that age group used the platform. The respondents who were 60 years old or older, used Twitter the least - only 15 percent of them.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset features the training models, emotion classifications and emotion patterns before and after events, related to the paper:F. Kunneman, M. van Mulken and A. Van den Bosch, Anticipointment detection in event tweets (under review)Abstract of the study:We developed a system to detect positive expectation, disappointment, and satisfaction in tweets that refer to events automatically discovered in the Twitter stream. The emotional content shared on Twitter when referring to public events can provide insights into the presumed and experienced quality of the event. We expected to find a connection between positive expectation and disappointment, a succession that is sometimes referred to as anticipointment. The application of computational approaches makes it possible to detect the presence and strength of this hypothetical relation for a large number of events. We extracted events from a longitudinal data set of Dutch Twitter posts, and modeled classifiers to recognize emotion in the tweets related to those events by means of hashtag-labeled training data. After classifying all tweets before and after the events in our data set, we summarized the collective emotions by calculating the percentage of tweets classified with an emotion as well as ranking tweets based on the classifier confidence score for an emotion and selecting the 90th percentile. Only a weak correlation of around 0.2 was found between positive expectation and disappointment, while a higher correlation of 0.6 was found between positiveexpectation and satisfaction. The most anticipointing events were events with a clear loss, such as a canceled event or when the favored sports team had lost. We conclude that senders of Twitter posts might be more inclined to share satisfaction than disappointment after a much anticipated event.Subject period: January 1st 2011 until October 31st 2015 Date: start=2015-11-01; end=2016-02-28 (data collection)
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.