Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.
https://i.imgur.com/nTv3Iuu.png" alt="Example Analysis - Inbound Volume for the Top 20 Brands">
Natural language remains the densest encoding of human experience we have, and innovation in NLP has accelerated to power understanding of that data, but the datasets driving this innovation don't match the real language in use today. The Customer Support on Twitter dataset offers a large corpus of modern English (mostly) conversations between consumers and customer support agents on Twitter, and has three important advantages over other conversational text datasets:
The size and breadth of this dataset inspires many interesting questions:
Dataset built with PointScrape.
The dataset is a CSV, where each row is a tweet. The different columns are described below. Every conversation included has at least one request from a consumer and at least one response from a company. Which user IDs are company user IDs can be calculated using the inbound field.
tweet_idA unique, anonymized ID for the Tweet. Referenced by response_tweet_id and in_response_to_tweet_id.
author_idA unique, anonymized user ID. @s in the dataset have been replaced with their associated anonymized user ID.
inboundWhether the tweet is "inbound" to a company doing customer support on Twitter. This feature is useful when re-organizing data for training conversational models.
created_atDate and time when the tweet was sent.
textTweet content. Sensitive information like phone numbers and email addresses are replaced with mask values like _email_.
response_tweet_idIDs of tweets that are responses to this tweet, comma-separated.
in_response_to_tweet_idID of the tweet this tweet is in response to, if any.
Know of other brands the dataset should include? Found something that needs to be fixed? Start a discussion, or email me directly at $FIRSTNAME@$LASTNAME.com!
A huge thank you to my friends who helped bootstrap the list of companies that do customer support on Twitter! There are many rocks that would have been left un-turned were it not for your suggestions!
For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Amin Aslami
Released under Apache 2.0
Facebook
TwitterThe Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact. The dataset includes replies of companies like Apple, Amazon, Uber, Delta, Spotify and others.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Muhammad Asif
Released under Apache 2.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Facebook
TwitterBy Krystal Jensen [source]
The dataset Twitter Data: Tweets and User Interactions provides comprehensive information about tweets and user interactions on the popular social media platform Twitter. The dataset includes various attributes that shed light on the characteristics and engagement metrics of tweets, allowing for in-depth analysis of user behavior and content performance.
One of the key variables in this dataset is the Klout score, which represents the influence and reputation of the Twitter users who posted the tweets. This numeric metric helps assess the impact a user has on their audience and provides insights into their social media presence.
Another essential attribute is the text content of each tweet. By examining this textual data, analysts can uncover valuable information about trending topics, opinions, sentiments, conversations, or news shared by users. It serves as a primary source for understanding what people share publicly on Twitter.
The dataset Twitter+data+in+sheets.csv serves as a reliable resource for conducting research or performing analytics that require detailed information about Twitter activity. It covers aspects such as tweet characteristics (including length and language), engagement metrics (such as retweets and favorites), sentiment analysis (revealing positive or negative emotions expressed), as well as individual user details.
By utilizing this extensive dataset, researchers can gain valuable insights into patterns of online communication within Twitter's vast network. They can identify influential individuals with high Klout scores who have substantial reach among their followers or communities. Additionally, they can analyze various aspects related to tweet content such as sentiment analysis to understand public opinion trends or measure engagement levels through counts like retweets and favorites.
Overall, this dataset serves as an invaluable resource for anyone interested in comprehensively analyzing tweets' characteristics, exploring how users interact with them across different dimensions like popularity or sentiment analysis groups—or examining correlations between Klout scores with other factors influencing engagement levels like time posted
Welcome to the Twitter Data: Tweets and User Interactions dataset! This dataset provides valuable insights into tweet characteristics and user engagement on Twitter. Here is a useful guide on how to make the most out of this dataset:
Understanding the Columns: There are two main columns in this dataset:
- Klout Score (Numeric): The Klout score indicates the influence of the user who posted the tweet. A higher Klout score suggests greater influence and reach.
- Text Content of Tweet (Text): This column contains the actual text content of each tweet.
Analyzing Tweet Characteristics: The text content column will help you understand various aspects of tweets, such as language, sentiment, trending topics, or specific keywords used by users. You can perform text analysis techniques like word frequency analysis or sentiment analysis to gain insights into tweet characteristics.
Examining User Engagement: The Klout score provides a measure of user influence on Twitter. By analyzing this column, you can identify highly influential users who generate higher engagement rates with their tweets. You can further explore interactions (likes, retweets, replies) between these influential users and other Twitter users mentioned in their tweets.
Identifying Trends and Patterns: With this dataset's rich information about tweet content and user engagement, you can identify popular trends or patterns among highly engaged tweets or influential users over different time periods.
Remember that dates are not included in this guide since they were not provided in the original request for creating it.
Please note that it is essential to responsibly use this data for any analysis or research purposes while adhering to ethical considerations related to privacy rights and data usage policies set by both Kaggle platform rules as well as any relevant privacy regulations.
Best regards, [Your Name]
- Analyzing the relationship between Klout score and the content of tweets: This dataset can be used to investigate whether there is a correlation between a user's Klout score (a measure of their social media influence) and the characteristics of their tweets. By examining factors such as tweet length, sentiment, and engagement metrics, researchers can gain...
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Muhammad Asif
Released under Apache 2.0
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Galal Qassas
Released under MIT
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains Twitter support conversations collected from various company accounts. It includes customer inquiries and corresponding support responses. The data is useful for training AI chatbots, analyzing customer service trends, and developing sentiment analysis models.
This dataset contains customer support interactions on Twitter. It includes the following columns: tweet_id: A unique identifier for each tweet. author_id: The unique ID of the user who posted the tweet. inbound: A boolean value indicating whether the tweet is from a customer (True) or from the support team (False). created_at: The timestamp of when the tweet was posted (in UTC format). text: The content of the tweet. response_tweet_id: The unique ID of the response tweet, if applicable. in_response_to_tweet_id: The ID of the original tweet to which this tweet is responding.
How This Data Can Be Used? Training a chatbot: Helps in generating automated support responses. Sentiment analysis: Can analyze whether tweets are complaints, queries, or feedback. Conversation tracking: By linking response tweets with original messages.
originalAuthor : MANORAMA Source : https://www.kaggle.com/datasets/manovirat/aspect/data
Note: This dataset is shared for educational and research purposes only.
Facebook
Twitter
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data was collected from several customer care accounts as inquiries of the customers.
"fullText": This variable contains the full-text content of the tweet. "lang": This variable indicates the language in which the tweet is written. "viewsCount": This variable represents the count of views or impressions the tweet has received. "bookmarkCount": This variable represents the count of times the tweet has been bookmarked by users. "favoriteCount": This variable represents the count of times the tweet has been favorited by users. "replyCount": This variable represents the count of replies the tweet has received. "retweetCount": This variable represents the count of times the tweet has been retweeted by users. "quoteCount": This variable represents the count of times the tweet has been quoted by users.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains tweets related to major US airlines and is widely used for NLP and sentiment analysis tasks. Each record includes the tweet text, timestamp, airline name, and sentiment label (positive, negative, neutral). This uploaded version is prepared to support advanced text processing, machine learning, and anomaly detection experiments.
This dataset is used in a machine learning workflow focused on:
- sentiment analysis
- embedding generation (transformers)
- dimensionality reduction (PCA, UMAP)
- clustering and visualization
- unsupervised anomaly detection using Isolation Forest
It is especially suited for exploring changes in public sentiment, event detection, and contextual analysis in social media data.
Originally derived from the Twitter US Airline Sentiment dataset on Kaggle.
This uploaded version is intended for educational, analytical, and research purposes.
If you're using this dataset in a notebook, ensure you update your file path accordingly: ```python df = pd.read_csv("/kaggle/input/twitter-airline-sentiment-dataset/Tweets.csv")
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains public tweets posted by users addressing major US airlines on Twitter. Each record includes the tweet text along with sentiment labels such as positive, negative, or neutral. It may also include additional fields like tweet ID, airline name, timestamp, and user-related metadata that help in analyzing the nature of customer feedback.
The dataset was created to study customer opinions and experiences shared on social media regarding airline services. It is widely used for sentiment analysis, natural language processing (NLP), and machine learning tasks to understand how customers express satisfaction or dissatisfaction and to build models that can automatically classify sentiment from text.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use the dataset, cite the papers: https://doi.org/10.1016/j.eswa.2022.117541 and https://doi.org/10.1371/journal.pone.0274213
The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.
The following columns are in the dataset:
➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.
Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.
Facebook
TwitterA collection of tweets scraped from Twitter since January 2020 using the search parameter "news". The resulting json file was then separated into 2 separate .csv files. One contains the tweets, whereas the other contains the network analysis inputs.
The associated network analysis file is a document containing all the nodes and edges derived from the interactions in the tweets as follows:
Nodes, all distinct tweeters including mentions
Edges, defined as when one user mentions another user in a tweet or replies
Weight, number of time the edge interaction has taken place
Here is some code to get started- https://github.com/datadoctor100/twitter_analysis
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Tweets scraped will all possible datapoints provided by twitter in each tweet. For data extraction or scraping contact me on telegram - @akaseobhw
All datapoints present for each tweet.
Each entry in the dataset represents a tweet along with various attributes such as the tweet's ID, URL, text content, retweet count, reply count, like count, quote count, view count, creation date, language, and more. Additionally, there are details about the tweet's author, including their username, profile URL, follower count, following count, profile picture, cover picture, description, location, creation date, and more.
Here's a brief description of the key fields present in each tweet entry:
This dataset can be analyzed to gain insights into trends, sentiments, and user behavior on Twitter. You can use Python libraries like pandas to load this dataset and perform various analyses and visualizations.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Chatbots are AI-powered programs designed to replicate human conversation. They are capable of performing a wide range of tasks, including answering questions, offering directions, controlling smart home thermostats, and playing music, among other functions. ChatGPT is a popular AI-based chatbot that generates meaningful responses to queries, aiding people in learning. While some individuals support ChatGPT, others view it as a disruptive tool in the field of education. Discussions about this tool can be found across different social media platforms. Analyzing the sentiment of such social media data, which comprises people’s opinions, is crucial for assessing public sentiment regarding the success and shortcomings of such tools. This study performs a sentiment analysis and topic modeling on ChatGPT-based tweets. ChatGPT-based tweets are the author’s extracted tweets from Twitter using ChatGPT hashtags, where users share their reviews and opinions about ChatGPT, providing a reference to the thoughts expressed by users in their tweets. The Latent Dirichlet Allocation (LDA) approach is employed to identify the most frequently discussed topics in relation to ChatGPT tweets. For the sentiment analysis, a deep transformer-based Bidirectional Encoder Representations from Transformers (BERT) model with three dense layers of neural networks is proposed. Additionally, machine and deep learning models with fine-tuned parameters are utilized for a comparative analysis. Experimental results demonstrate the superior performance of the proposed BERT model, achieving an accuracy of 96.49%.
Facebook
TwitterThere are total of 20 CSV files including tweets related to COVID-19 from 20 March 2020 to 08 April 2020.
For each file, the following columns are included. Columns: coordinates, created_at, hashtags, media, urls, favorite count, id, in_reply_to_screen_name, in_reply_to_status_id, in_reply_to_user_id.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
The relevant sections of Twitter's Terms of Service [1] and Developer Agreement [2]. ** According to Twitter's Developer Policy §6 [3]: "If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs" and "any Content provided to third parties via non-automated file download remains subject to this Policy". [1] https://twitter.com/tos?lang=en [2] https://dev.twitter.com/overview/terms/agreement [3] https://dev.twitter.com/overview/terms/policy#6.Update_Be_a_Good_Partner_to_Twitter
Facebook
TwitterTwitter data: Approx 525,000 Tweets (0.5m) with keyword 'weather' for 3-21 Dec 2022 including RT retweets. You can download this data and more; visit our site for more data twtdata.com Please contact mark@twtdata.com if you need more data.
Facebook
TwitterIf you use this dataset, Please ensure you reference accordingly. Kindly see reference below.
Ogunleye, B. O. (2021). Statistical learning approaches to sentiment analysis in the Nigerian banking context (Doctoral dissertation, Sheffield Hallam University).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.
https://i.imgur.com/nTv3Iuu.png" alt="Example Analysis - Inbound Volume for the Top 20 Brands">
Natural language remains the densest encoding of human experience we have, and innovation in NLP has accelerated to power understanding of that data, but the datasets driving this innovation don't match the real language in use today. The Customer Support on Twitter dataset offers a large corpus of modern English (mostly) conversations between consumers and customer support agents on Twitter, and has three important advantages over other conversational text datasets:
The size and breadth of this dataset inspires many interesting questions:
Dataset built with PointScrape.
The dataset is a CSV, where each row is a tweet. The different columns are described below. Every conversation included has at least one request from a consumer and at least one response from a company. Which user IDs are company user IDs can be calculated using the inbound field.
tweet_idA unique, anonymized ID for the Tweet. Referenced by response_tweet_id and in_response_to_tweet_id.
author_idA unique, anonymized user ID. @s in the dataset have been replaced with their associated anonymized user ID.
inboundWhether the tweet is "inbound" to a company doing customer support on Twitter. This feature is useful when re-organizing data for training conversational models.
created_atDate and time when the tweet was sent.
textTweet content. Sensitive information like phone numbers and email addresses are replaced with mask values like _email_.
response_tweet_idIDs of tweets that are responses to this tweet, comma-separated.
in_response_to_tweet_idID of the tweet this tweet is in response to, if any.
Know of other brands the dataset should include? Found something that needs to be fixed? Start a discussion, or email me directly at $FIRSTNAME@$LASTNAME.com!
A huge thank you to my friends who helped bootstrap the list of companies that do customer support on Twitter! There are many rocks that would have been left un-turned were it not for your suggestions!
For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.