Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Muhammad Asif
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Amin Aslami
Released under Apache 2.0
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.
https://i.imgur.com/nTv3Iuu.png" alt="Example Analysis - Inbound Volume for the Top 20 Brands">
Natural language remains the densest encoding of human experience we have, and innovation in NLP has accelerated to power understanding of that data, but the datasets driving this innovation don't match the real language in use today. The Customer Support on Twitter dataset offers a large corpus of modern English (mostly) conversations between consumers and customer support agents on Twitter, and has three important advantages over other conversational text datasets:
The size and breadth of this dataset inspires many interesting questions:
Dataset built with PointScrape.
The dataset is a CSV, where each row is a tweet. The different columns are described below. Every conversation included has at least one request from a consumer and at least one response from a company. Which user IDs are company user IDs can be calculated using the inbound field.
tweet_idA unique, anonymized ID for the Tweet. Referenced by response_tweet_id and in_response_to_tweet_id.
author_idA unique, anonymized user ID. @s in the dataset have been replaced with their associated anonymized user ID.
inboundWhether the tweet is "inbound" to a company doing customer support on Twitter. This feature is useful when re-organizing data for training conversational models.
created_atDate and time when the tweet was sent.
textTweet content. Sensitive information like phone numbers and email addresses are replaced with mask values like _email_.
response_tweet_idIDs of tweets that are responses to this tweet, comma-separated.
in_response_to_tweet_idID of the tweet this tweet is in response to, if any.
Know of other brands the dataset should include? Found something that needs to be fixed? Start a discussion, or email me directly at $FIRSTNAME@$LASTNAME.com!
A huge thank you to my friends who helped bootstrap the list of companies that do customer support on Twitter! There are many rocks that would have been left un-turned were it not for your suggestions!
For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains Twitter support conversations collected from various company accounts. It includes customer inquiries and corresponding support responses. The data is useful for training AI chatbots, analyzing customer service trends, and developing sentiment analysis models.
This dataset contains customer support interactions on Twitter. It includes the following columns: tweet_id: A unique identifier for each tweet. author_id: The unique ID of the user who posted the tweet. inbound: A boolean value indicating whether the tweet is from a customer (True) or from the support team (False). created_at: The timestamp of when the tweet was posted (in UTC format). text: The content of the tweet. response_tweet_id: The unique ID of the response tweet, if applicable. in_response_to_tweet_id: The ID of the original tweet to which this tweet is responding.
How This Data Can Be Used? Training a chatbot: Helps in generating automated support responses. Sentiment analysis: Can analyze whether tweets are complaints, queries, or feedback. Conversation tracking: By linking response tweets with original messages.
originalAuthor : MANORAMA Source : https://www.kaggle.com/datasets/manovirat/aspect/data
Note: This dataset is shared for educational and research purposes only.
Facebook
TwitterContext
This dataset is a part of our research work titled "Opinion Mining of Customer Reviews Using Supervised Learning Algorithms". If you use this dataset then please cite our work. You can find the article in https://ieeexplore.ieee.org/document/9733435
Content
Nowadays, a lot of people express their opinions on various topics using social networking sites. Twitter has become a famous social networking site where people can express their opinions to the point and so it has become a great source for opinion mining. In this research, the goal was to train and build a model that can automatically and accurately categorize the opinion of customer tweet reviews about popular cell phone brands. We have used python TextBlob library for getting the polarity values of all the tweet reviews of the dataset. We have also used Support Vector Machine (SVM), Naïve Bayes, Logistic Regression, Decision Tree and Random Forest algorithms along with Bag of Words and TF-IDF vectorizers separately to train and build the model. We have investigated the opinions using five classes which are Strongly Positive, Positive, Neutral, Negative and Strongly Negative.
When referencing this dataset please cite the below paper
Bibtex @inproceedings{arif2021opinion, title={Opinion Mining of Customer Reviews Using Supervised Learning Algorithms}, author={Arif, Shibbir Ahmed and Hossain, Taslima Binte}, booktitle={2021 5th International Conference on Electrical Information and Communication Technology (EICT)}, pages={1--6}, year={2021}, organization={IEEE} }
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data was collected from several customer care accounts as inquiries of the customers.
"fullText": This variable contains the full-text content of the tweet. "lang": This variable indicates the language in which the tweet is written. "viewsCount": This variable represents the count of views or impressions the tweet has received. "bookmarkCount": This variable represents the count of times the tweet has been bookmarked by users. "favoriteCount": This variable represents the count of times the tweet has been favorited by users. "replyCount": This variable represents the count of replies the tweet has received. "retweetCount": This variable represents the count of times the tweet has been retweeted by users. "quoteCount": This variable represents the count of times the tweet has been quoted by users.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Galal Qassas
Released under MIT
Facebook
Twitter
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset can be used for Sentiment Analysis which contains the tweets about apple products on twitter. This data set has basically 3 headers 1. tweet_text 2.emotion_in_tweet_is_directed_at 3.is_there_an_emotion_directed_at_a_brand_or_product
Facebook
TwitterThere are total of 20 CSV files including tweets related to COVID-19 from 20 March 2020 to 08 April 2020.
For each file, the following columns are included. Columns: coordinates, created_at, hashtags, media, urls, favorite count, id, in_reply_to_screen_name, in_reply_to_status_id, in_reply_to_user_id.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
The relevant sections of Twitter's Terms of Service [1] and Developer Agreement [2]. ** According to Twitter's Developer Policy §6 [3]: "If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs" and "any Content provided to third parties via non-automated file download remains subject to this Policy". [1] https://twitter.com/tos?lang=en [2] https://dev.twitter.com/overview/terms/agreement [3] https://dev.twitter.com/overview/terms/policy#6.Update_Be_a_Good_Partner_to_Twitter
Facebook
TwitterIf you use this dataset, Please ensure you reference accordingly. Kindly see reference below.
Ogunleye, B. O. (2021). Statistical learning approaches to sentiment analysis in the Nigerian banking context (Doctoral dissertation, Sheffield Hallam University).
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Public dataset that everyone can use Creating a dataframe from the tweets list above. E-mail supervision In order to keep a regular discussion going, it is useful to use e-mail. There are many distance discussions that take place by e-mail or fax, backed up with some visits for full-blown supervisions. If the supervisor is in another country, then e-mail contact is essential, as the face-to-face supervisory contacts will be condensed into the periods when you can both be in the same country. Make e-mail contacts lucid, short and precise, with some friendly tone to establish a personal touch. Try not to get involved in excessively chatty discussions but concentrate on asking questions, seeking information and reporting on findings for comment. E-mail is quite an insistent medium. If you make contact too frequently, the supervisor will feel harassed. If you make contact too infrequently, the supervisor will feel guilty (and so will you), wondering what you are up to. Regular brief contact with some very full discussions on work in progress at regular intervals will maintain a sense of a working relationship over time and space.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset description Users assessed tweets related to various brands and products, providing evaluations on whether the sentiment conveyed was positive, negative, or neutral. Additionally, if the tweet conveyed any sentiment, contributors identified the specific brand or product targeted by that emotion.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fa48606bfcaf80acebbb6edff7895484a%2Fdownload.png?generation=1704673111671747&alt=media" alt="">
Train Dataset : 8589 rows x 3 columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fe998ba81ca461699a787ff7305486b24%2FTrainDS.JPG?generation=1704672608361793&alt=media" alt="">
Test Dataset : 504 rows x 1 columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2F07df18965e91f84df123270aabb641e1%2Ftest.JPG?generation=1704679582009718&alt=media" alt="">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains tweets related to major US airlines and is widely used for NLP and sentiment analysis tasks. Each record includes the tweet text, timestamp, airline name, and sentiment label (positive, negative, neutral). This uploaded version is prepared to support advanced text processing, machine learning, and anomaly detection experiments.
This dataset is used in a machine learning workflow focused on:
- sentiment analysis
- embedding generation (transformers)
- dimensionality reduction (PCA, UMAP)
- clustering and visualization
- unsupervised anomaly detection using Isolation Forest
It is especially suited for exploring changes in public sentiment, event detection, and contextual analysis in social media data.
Originally derived from the Twitter US Airline Sentiment dataset on Kaggle.
This uploaded version is intended for educational, analytical, and research purposes.
If you're using this dataset in a notebook, ensure you update your file path accordingly: ```python df = pd.read_csv("/kaggle/input/twitter-airline-sentiment-dataset/Tweets.csv")
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains drug-related text entries structured to resemble tweets. It was generated from the drugsComTest_raw.csv dataset, which originally included patient reviews of medications. Source: Extracted from patient-submitted reviews on Drugs.com. Format: CSV file with two columns: drugName – the name of the drug mentioned. tweet – the review text reformatted to simulate a tweet-like message.
Purpose: To support Natural Language Processing (NLP) tasks such as sentiment analysis, drug-effect classification, and social media mining. To act as a proxy dataset for training or testing models on drug-related discussions, where actual Twitter data collection is restricted or unavailable.
Limitations: Not real Twitter data, but synthetic tweets generated from formal drug reviews. May differ in tone and structure compared to actual tweets.
Facebook
Twitterhttp://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html
The "Famous Keyword Twitter Replies Dataset" is a comprehensive collection of Twitter data that focuses on popular keywords and their associated replies. This dataset contains five essential columns that provide valuable insights into the Twitter conversation dynamics:
Keyword: This column represents the specific keyword or topic of interest that generated the original tweet. It helps identify the context or subject matter around which the conversation revolves.
Main_tweet: The main_tweet column contains the original tweet related to the keyword. It serves as the starting point or focal point of the conversation and often provides essential information or opinions on the given topic.
Main_likes: This column provides the number of likes received by the main_tweet. Likes serve as a measure of engagement and indicate the level of popularity or resonance of the original tweet within the Twitter community.
Reply: The reply column consists of the replies or responses to the main_tweet. These replies may include comments, opinions, additional information, or discussions related to the keyword or the original tweet itself. The replies help capture the diverse perspectives and conversations that emerge in response to the main_tweet.
Reply_likes: This column records the number of likes received by each reply. Similar to the main_likes column, the reply_likes column measures the level of engagement and popularity of individual replies. It enables the identification of particularly noteworthy or well-received replies within the dataset.
By analyzing this "Famous Keyword Twitter Replies Dataset," researchers, analysts, and data scientists can gain valuable insights into how popular keywords spark discussions on Twitter and how these discussions evolve through replies.
The dataset's information on likes allows for the evaluation of tweet and reply popularity, helping to identify influential or impactful content.
This dataset serves as a valuable resource for various applications, including sentiment analysis, trend identification, opinion mining, and understanding social media dynamics.
Number of tweets for each pairs of tweet and reply
Total has 17255 pairs of tweet/reply
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9998584%2Fc33bf662ec0b710877ba40287bc6025e%2Fcount.png?generation=1686152411950305&alt=media" alt="">
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 120 tweets from 8 popular celebrities. Each tweet is labeled as either real (actually posted by the celebrity) or fake (AI-generated to mimic their tone). It was originally built for a hackathon project called TweetLike, but was later cleaned and restructured to support machine learning projects.
The goal is to help explore how writing style, tone, and voice can be modeled — and how convincingly AI can imitate real people online. The dataset is ideal for NLP experiments like author prediction, fake vs real classification, and stylistic analysis.
To keep the dataset balanced and ML-friendly, we ensured that each celebrity has exactly 15 tweets. In some cases, controlled oversampling was used to meet this count — this is intentional to support fair training of models across classes.
Facebook
TwitterThe "Sentiment with 16 million tweets with locations" dataset is a collection of tweets with their respective geographical location information and sentiment labels. The dataset includes 16 million tweets from various locations around the world, spanning a period of several years. The sentiment labels for each tweet are binary, indicating whether the sentiment expressed in the tweet is positive or negative.
This dataset can be used for sentiment analysis and natural language processing tasks, such as training machine learning models to classify the sentiment of text data. Researchers and developers can use this dataset to analyze trends in sentiment across different locations and time periods, as well as to develop new algorithms and models for sentiment analysis.
Please note that this dataset is intended for research purposes only and should not be used for any commercial or legal applications. The dataset may also contain offensive or inappropriate language, and users should exercise caution when working with this data
Context In addition to the technical details of the "Sentiment with 16 million tweets with locations" dataset, some context that may be relevant to include in the About Dataset section could be:
Sentiment analysis can be challenging due to the complexity and ambiguity of language, as well as the variability of individual expression and context.
Large datasets like this one are important for developing accurate and robust sentiment analysis models, as they provide a diverse and representative sample of real-world text data.
Content It contains the following 7 fields:
Sentiment Target: The polarity of the tweet, indicated by a numeric value of 0 (negative), 2 (neutral), or 4 (positive).
Tweet ID: The unique identifier of the tweet.
Date: The date and time the tweet was posted in Coordinated Universal Time (UTC) format.
Query Flag: The keyword or phrase used to filter the tweets. If no query was used, the value is NO_QUERY.
User: The username of the Twitter account that posted the tweet.
Text: The actual text content of the tweet.
Location: The location of the tweet
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This is a dataset of Twitter stock prices over a range of 9 years. The stock prices' date ranges from November 2013 to October 2022. The data is in CSV format which is tabular and can be loaded quickly.
The dataset can be used for:
There are 7 columns in this dataset.
Note: The currency is in
USD($)
Image credits: IndiaTimes
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Muhammad Asif
Released under Apache 2.0