49 datasets found

Customer Support on Twitter
kaggle.com
zip
Updated Dec 3, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thought Vector (2017). Customer Support on Twitter [Dataset]. https://www.kaggle.com/thoughtvector/customer-support-on-twitter
Explore at:
zip(176772673 bytes)Available download formats
Dataset updated
Dec 3, 2017
Dataset authored and provided by
Thought Vector
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.

https://i.imgur.com/nTv3Iuu.png" alt="Example Analysis - Inbound Volume for the Top 20 Brands">

Context

Natural language remains the densest encoding of human experience we have, and innovation in NLP has accelerated to power understanding of that data, but the datasets driving this innovation don't match the real language in use today. The Customer Support on Twitter dataset offers a large corpus of modern English (mostly) conversations between consumers and customer support agents on Twitter, and has three important advantages over other conversational text datasets:

Focused - Consumers contact customer support to have a specific problem solved, and the manifold of problems to be discussed is relatively small, especially compared to unconstrained conversational datasets like the reddit Corpus.

Natural - Consumers in this dataset come from a much broader segment than those in the Ubuntu Dialogue Corpus and have much more natural and recent use of typed text than the Cornell Movie Dialogs Corpus.

Succinct - Twitter's brevity causes more natural responses from support agents (rather than scripted), and to-the-point descriptions of problems and solutions. Also, its convenient in allowing for a relatively low message limit size for recurrent nets.

Inspiration

The size and breadth of this dataset inspires many interesting questions:

Can we predict company responses? Given the bounded set of subjects handled by each company, the answer seems like yes!

Do requests get stale? How quickly do the best companies respond, compared to the worst?

Can we learn high quality dense embeddings or representations of similarity for topical clustering?

How does tone affect the customer support conversation? Does saying sorry help?

Can we help companies identify new problems, or ones most affecting their customers?

Acknowledgements

Dataset built with PointScrape.

Content

The dataset is a CSV, where each row is a tweet. The different columns are described below. Every conversation included has at least one request from a consumer and at least one response from a company. Which user IDs are company user IDs can be calculated using the inbound field.

tweet_id

A unique, anonymized ID for the Tweet. Referenced by response_tweet_id and in_response_to_tweet_id.

author_id

A unique, anonymized user ID. @s in the dataset have been replaced with their associated anonymized user ID.

inbound

Whether the tweet is "inbound" to a company doing customer support on Twitter. This feature is useful when re-organizing data for training conversational models.

created_at

Date and time when the tweet was sent.

text

Tweet content. Sensitive information like phone numbers and email addresses are replaced with mask values like _email_.

response_tweet_id

IDs of tweets that are responses to this tweet, comma-separated.

in_response_to_tweet_id

ID of the tweet this tweet is in response to, if any.

Contributing

Know of other brands the dataset should include? Found something that needs to be fixed? Start a discussion, or email me directly at $FIRSTNAME@$LASTNAME.com!

Acknowledgements

A huge thank you to my friends who helped bootstrap the list of companies that do customer support on Twitter! There are many rocks that would have been left un-turned were it not for your suggestions!

Relevant Resources

NLTK - casual_tokenize for social media text tokenizing, vader sentiment analysis for social media text

SciKit Learn - BoW Count Vectorizer, Multinomial Naive Bayes Classifier

Topic Modeling via Phrase detection with gensim

facebook research - fastText text classifier

Licensing

For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.
Customer Support on Twitter
kaggle.com
zip
Updated Oct 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amin Aslami (2024). Customer Support on Twitter [Dataset]. https://www.kaggle.com/datasets/aminaslam/customer-support-on-twitter
Explore at:
zip(78948 bytes)Available download formats
Dataset updated
Oct 17, 2024
Authors
Amin Aslami
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Amin Aslami

Released under Apache 2.0

Contents
Customer Support on Twitter
berd-platform.de
csv
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stuart Axelbrooke; Stuart Axelbrooke (2025). Customer Support on Twitter [Dataset]. http://doi.org/10.34740/kaggle/dsv/8841
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.34740/kaggle/dsv/8841
Dataset updated
Jul 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Stuart Axelbrooke; Stuart Axelbrooke
Time period covered
Mar 12, 2017
Description
The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact. The dataset includes replies of companies like Apple, Amazon, Uber, Delta, Spotify and others.
Customer Support Twitter Data
kaggle.com
zip
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Asif (2025). Customer Support Twitter Data [Dataset]. https://www.kaggle.com/datasets/muhammadasif786/customer-support-twitter-data
Explore at:
zip(176765850 bytes)Available download formats
Dataset updated
Aug 29, 2025
Authors
Muhammad Asif
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Muhammad Asif

Released under Apache 2.0

Contents
Twitter Tweets Sentiment Dataset
kaggle.com
zip
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
Explore at:
zip(1289519 bytes)Available download formats
Dataset updated
Apr 8, 2022
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

Description:

Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

Columns:

textID - unique ID for each piece of text

text - the text of the tweet

sentiment - the general sentiment of the tweet

Acknowledgement:

The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

Objective:

Understand the Dataset & cleanup (if required).

Build classification models to predict the twitter sentiments.

Compare the evaluation metrics of vaious classification algorithms.
Twitter customer support twitter llm finetune
kaggle.com
zip
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Asif (2025). Twitter customer support twitter llm finetune [Dataset]. https://www.kaggle.com/datasets/muhammadasif786/twitter-customer-support-twitter-llm-finetune/suggestions
Explore at:
zip(176765850 bytes)Available download formats
Dataset updated
Sep 1, 2025
Authors
Muhammad Asif
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Muhammad Asif

Released under Apache 2.0

Contents
Customer Support Tweets (945M rows)
kaggle.com
zip
Updated Oct 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Galal Qassas (2025). Customer Support Tweets (945M rows) [Dataset]. https://www.kaggle.com/datasets/galalqassas/customer-support-tweets-945m-rows
Explore at:
zip(74154613 bytes)Available download formats
Dataset updated
Oct 31, 2025
Authors
Galal Qassas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Galal Qassas

Released under MIT

Contents
Saudi Customer Care Tweets
kaggle.com
zip
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah Alsharif (2024). Saudi Customer Care Tweets [Dataset]. https://www.kaggle.com/datasets/alshreefabdullh/saudi-customer-care-tweets
Explore at:
zip(10030314 bytes)Available download formats
Dataset updated
Mar 13, 2024
Authors
Abdullah Alsharif
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Saudi Arabia
Description
This data was collected from several customer care accounts as inquiries of the customers.

"fullText": This variable contains the full-text content of the tweet. "lang": This variable indicates the language in which the tweet is written. "viewsCount": This variable represents the count of views or impressions the tweet has received. "bookmarkCount": This variable represents the count of times the tweet has been bookmarked by users. "favoriteCount": This variable represents the count of times the tweet has been favorited by users. "replyCount": This variable represents the count of replies the tweet has received. "retweetCount": This variable represents the count of times the tweet has been retweeted by users. "quoteCount": This variable represents the count of times the tweet has been quoted by users.
Support data for Chatbots
kaggle.com
zip
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Faizan (2025). Support data for Chatbots [Dataset]. https://www.kaggle.com/datasets/mohammadfaizannaeem/3m-tweet-data-of-world-biggest-brands-on-twitter/data
Explore at:
zip(176765850 bytes)Available download formats
Dataset updated
Feb 26, 2025
Authors
Mohammad Faizan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
File Description

This dataset contains Twitter support conversations collected from various company accounts. It includes customer inquiries and corresponding support responses. The data is useful for training AI chatbots, analyzing customer service trends, and developing sentiment analysis models.

Column Description

This dataset contains customer support interactions on Twitter. It includes the following columns: tweet_id: A unique identifier for each tweet. author_id: The unique ID of the user who posted the tweet. inbound: A boolean value indicating whether the tweet is from a customer (True) or from the support team (False). created_at: The timestamp of when the tweet was posted (in UTC format). text: The content of the tweet. response_tweet_id: The unique ID of the response tweet, if applicable. in_response_to_tweet_id: The ID of the original tweet to which this tweet is responding.

How This Data Can Be Used? Training a chatbot: Helps in generating automated support responses. Sentiment analysis: Can analyze whether tweets are complaints, queries, or feedback. Conversation tracking: By linking response tweets with original messages.

originalAuthor : MANORAMA Source : https://www.kaggle.com/datasets/manovirat/aspect/data

Note: This dataset is shared for educational and research purposes only.
Twitter Airline Sentiment Dataset
kaggle.com
zip
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chandana Ramakrishna (2025). Twitter Airline Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/chandana890/twitter-airline-sentiment-dataset
Explore at:
zip(1134990 bytes)Available download formats
Dataset updated
Nov 14, 2025
Authors
Chandana Ramakrishna
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

This dataset contains tweets related to major US airlines and is widely used for NLP and sentiment analysis tasks. Each record includes the tweet text, timestamp, airline name, and sentiment label (positive, negative, neutral). This uploaded version is prepared to support advanced text processing, machine learning, and anomaly detection experiments.

What's Included

Tweets.csv – Full collection of airline-related tweets

Text content suitable for NLP tasks

Timestamp information (useful for time-based analysis)

Sentiment labels for classification and evaluation

Cleaned text field for direct use in ML pipelines

Purpose of This Dataset

This dataset is used in a machine learning workflow focused on: - sentiment analysis
- embedding generation (transformers)
- dimensionality reduction (PCA, UMAP)
- clustering and visualization
- unsupervised anomaly detection using Isolation Forest

It is especially suited for exploring changes in public sentiment, event detection, and contextual analysis in social media data.

Key Use Cases

Building and testing NLP models

Semantic similarity and embedding-based analysis

Sentiment classification

Detecting anomalous posts or time periods

Visualizing tweet clusters using UMAP

Studying customer feedback patterns in the airline industry

Source

Originally derived from the Twitter US Airline Sentiment dataset on Kaggle.
This uploaded version is intended for educational, analytical, and research purposes.

Notes

If you're using this dataset in a notebook, ensure you update your file path accordingly: ```python df = pd.read_csv("/kaggle/input/twitter-airline-sentiment-dataset/Tweets.csv")
customer care tweets KSA
kaggle.com
zip
Updated May 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mansour (2022). customer care tweets KSA [Dataset]. https://www.kaggle.com/datasets/mansourhussain/customer-care-tweets-ksa/data
Explore at:
zip(642212 bytes)Available download formats
Dataset updated
May 20, 2022
Authors
Mansour
Area covered
السعودية
Description
- this data contains 10000 tweets for a telecom company's customer care account on Twitter.

- this data need to use in Sentiment Analysis in Arabic.
The Climate Change Twitter Dataset
kaggle.com
data.mendeley.com
zip
Updated May 26, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/deffro/the-climate-change-twitter-dataset
Explore at:
zip(428878019 bytes)Available download formats
Dataset updated
May 26, 2022
Authors
Dimitrios Effrosynidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
If you use the dataset, cite the papers: https://doi.org/10.1016/j.eswa.2022.117541 and https://doi.org/10.1371/journal.pone.0274213

The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

The following columns are in the dataset:

➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.
Sentiment Analysis Mental Health Tweets 2017-2023
kaggle.com
zip
Updated Apr 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zee M (2023). Sentiment Analysis Mental Health Tweets 2017-2023 [Dataset]. https://www.kaggle.com/datasets/zoegreenslade/twittermhcampaignsentmentanalysis
Explore at:
zip(751228815 bytes)Available download formats
Dataset updated
Apr 5, 2023
Authors
Zee M
Description
These datasets contain tweets from four mental health campaigns on Twitter, between the years of 2017-2023.

The datasets have accompanying notebooks for each stage of the analysis:

1. Scraping tweets from Twitter. Outputs --> ('UMHD', 'OCD', 'EDAW', 'MHAW')

2.EDA, and merging the data together. Output --> ('MH_Campaigns_1723')

3. Cleaning the tweets. Output --> ('MH_Campaign_Tweets_Clean_1723')

4. Word Clouds, Visualisations and tweet preprocessing for Sentiment Analysis. Output --> ('MH_Campaign_Tweets_Tokenised_1723')

5. VADER sentiment Analysis. Output --> ('MH_Campaign_Tweets_Sentiment_Scored_1723')

They are broken down in this way so that you can practice the project from any stage - if you don't want to do the scraping but do want to do visuals, for example, you can begin at stage 4 with the relevant dataset.

To see a presentation of the main insights that I pulled out of this data, follow this link: bit.ly/KaggleTMHC

Also available on GitHub: https://github.com/zeehama/Sentiment-Analysis-on-4-Mental-Health-Campaigns-Twitter-
twitter-news
kaggle.com
zip
Updated Aug 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Guy (2022). twitter-news [Dataset]. https://www.kaggle.com/datasets/deeguy/twitter-news
Explore at:
zip(1299040415 bytes)Available download formats
Dataset updated
Aug 17, 2022
Authors
Data Guy
Description
A collection of tweets scraped from Twitter since January 2020 using the search parameter "news". The resulting json file was then separated into 2 separate .csv files. One contains the tweets, whereas the other contains the network analysis inputs.

The associated network analysis file is a document containing all the nodes and edges derived from the interactions in the tweets as follows:

Nodes, all distinct tweeters including mentions Edges, defined as when one user mentions another user in a tweet or replies Weight, number of time the edge interaction has taken place

Here is some code to get started- https://github.com/datadoctor100/twitter_analysis
US Airlines Twitter (Over time)
kaggle.com
zip
Updated Nov 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). US Airlines Twitter (Over time) [Dataset]. https://www.kaggle.com/datasets/thedevastator/sentiment-analysis-of-us-airline-twitter-data
Explore at:
zip(1130886 bytes)Available download formats
Dataset updated
Nov 18, 2022
Authors
The Devastator
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
US Airlines Twitter (Over time)

Study the trend customer satisfaction over time

About this dataset

The columns in the dataset include index, unit id, golden, unit state, trusted judgments, last judgment at, airline sentiment, airline sentiment confidence, negative reason, negative reason confidence, airline_sentiment_gold and retweet count. There is also text included for each tweet as well as tweet location and user timezone.

Using this dataset, you can get a feel for how customers of various airlines feel about their service. You can use the data to analyze trends over time or compare different airlines. Some research ideas include using airline sentiment to predict the stock market or using the negativereason data to help airlines improve their customer service

How to use the dataset

Looking at this dataset, you can get a feel for how customers of various airlines feel about their service. The data includes the airline, the tweet text, the date of the tweet, and various other information. You can use this to analyze trends over time or compare different airlines

Research Ideas

Using airline sentiment to predict the stock market - is there a correlation between how the public perceives an airline and how that airline's stock performs?

Using negativereason data to help airlines improve their customer service - which negative reasons are mentioned most often? Are there certain airlines that are consistently mentioned for specific reasons?

Use the tweet data to map out airline hot spots - where do people tend to tweet about certain airlines the most? Is there a geographic pattern to sentiment about specific airlines?

Acknowledgements

If you use this dataset in your research, please credit Social Media Data

License

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) - You are free to: - Share - copy and redistribute the material in any medium or format for non-commercial purposes only. - Adapt - remix, transform, and build upon the material for non-commercial purposes only. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - You may not: - Use the material for commercial purposes.

Columns

File: Airline-Sentiment-2-w-AA.csv | Column name | Description | |:---------------------------|:-----------------------------------------------------------------------------| | _golden | This column is the gold standard column. (Boolean) | | _unit_state | This column is the state of the unit. (String) | | _trusted_judgments | This column is the number of trusted judgments. (Numeric) | | _last_judgment_at | This column is the timestamp of the last judgment. (String) | | airline_sentiment | This column is the sentiment of the tweet. (String) | | negativereason | This column is the negative reason for the sentiment. (String) | | airline_sentiment_gold | This column is the gold standard sentiment of the tweet. (String) | | name | This column is the name of the airline. (String) | | negativereason_gold | This column is the gold standard negative reason for the sentiment. (String) | | retweet_count | This column is the number of retweets. (Numeric) | | text | This column is the text of the tweet. (String) | | tweet_coord | This column is the coordinates of the tweet. (String) | | tweet_created | This column is the timestamp of the tweet. (String) | | tweet_location | This column is the location of the tweet. (String) | | user_timezone | This column is the timezone of the user. (String) |
COVID-19 Twitter Dataset
kaggle.com
zip
Updated Jul 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingli SHI (2020). COVID-19 Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/shijingli/covid19-twitter-dataset
Explore at:
zip(29449949 bytes)Available download formats
Dataset updated
Jul 4, 2020
Authors
Jingli SHI
Description
Context

There are total of 20 CSV files including tweets related to COVID-19 from 20 March 2020 to 08 April 2020.

Content

For each file, the following columns are included. Columns: coordinates, created_at, hashtags, media, urls, favorite count, id, in_reply_to_screen_name, in_reply_to_status_id, in_reply_to_user_id.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Rights

The relevant sections of Twitter's Terms of Service [1] and Developer Agreement [2]. ** According to Twitter's Developer Policy §6 [3]: "If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs" and "any Content provided to third parties via non-automated file download remains subject to this Policy". [1] https://twitter.com/tos?lang=en [2] https://dev.twitter.com/overview/terms/agreement [3] https://dev.twitter.com/overview/terms/policy#6.Update_Be_a_Good_Partner_to_Twitter
Data from: Twitter Data
kaggle.com
zip
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akarsh kumar (2025). Twitter Data [Dataset]. https://www.kaggle.com/datasets/akarsh8/twitter-data
Explore at:
zip(204 bytes)Available download formats
Dataset updated
Jun 15, 2025
Authors
Akarsh kumar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Introducing Twitter dataset to help you access valuable Twitter data with the help of a powerful Twttr API with endpoints. You can easily retrieve twitter tweet details; twitter user followers and twitter followings; post likes, comments; quoted tweets, and retweets. You can also search for top, latest, videos, photos, and people, and access user tweets, replies, media, likes, and info by username or ID.
Bank customer tweets (10000)
kaggle.com
zip
Updated Sep 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bayode Ogunleye (2022). Bank customer tweets (10000) [Dataset]. https://www.kaggle.com/datasets/batoog/bank-customer-tweets-10000
Explore at:
zip(563396 bytes)Available download formats
Dataset updated
Sep 25, 2022
Authors
Bayode Ogunleye
Description
If you use this dataset, Please ensure you reference accordingly. Kindly see reference below.

Ogunleye, B. O. (2021). Statistical learning approaches to sentiment analysis in the Nigerian banking context (Doctoral dissertation, Sheffield Hallam University).
Twitter-Bot Detection Dataset
kaggle.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Goyal (2023). Twitter-Bot Detection Dataset [Dataset]. https://www.kaggle.com/datasets/goyaladi/twitter-bot-detection-dataset
Explore at:
zip(3083151 bytes)Available download formats
Dataset updated
May 31, 2023
Authors
Aditya Goyal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Welcome to the Bot Detect Dataset! This dataset offers a unique opportunity to delve into the world of Twitter bots. Explore user profiles, tweet content, retweet counts, and more. Uncover hidden patterns and gain insights into bot detection research. Join us on this exciting journey of understanding social media interactions and identifying bot accounts.
Twitter New Dataset 2024 March Data
kaggle.com
zip
Updated Mar 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayush Kumar Singh (2024). Twitter New Dataset 2024 March Data [Dataset]. https://www.kaggle.com/datasets/fastcurious/twitter-new-dataset-2024-march-data
Explore at:
zip(2923762 bytes)Available download formats
Dataset updated
Mar 11, 2024
Authors
Ayush Kumar Singh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Tweets scraped will all possible datapoints provided by twitter in each tweet. For data extraction or scraping contact me on telegram - @akaseobhw

All datapoints present for each tweet.

Each entry in the dataset represents a tweet along with various attributes such as the tweet's ID, URL, text content, retweet count, reply count, like count, quote count, view count, creation date, language, and more. Additionally, there are details about the tweet's author, including their username, profile URL, follower count, following count, profile picture, cover picture, description, location, creation date, and more.

Here's a brief description of the key fields present in each tweet entry:

type: Indicates the type of data, in this case, it's a tweet.

id: Unique identifier for the tweet.

url: URL of the tweet.

twitterUrl: Twitter URL of the tweet.

text: Text content of the tweet.

retweetCount: Number of retweets.

replyCount: Number of replies.

likeCount: Number of likes (favorites).

quoteCount: Number of times the tweet has been quoted.

viewCount: Number of views.

createdAt: Date and time when the tweet was created.

lang: Language of the tweet.

quoteId: ID of the quoted tweet, if this tweet is a quote.

bookmarkCount: Number of times the tweet has been bookmarked.

isReply: Indicates whether the tweet is a reply to another tweet.

author: Information about the author of the tweet.

userName: Username of the author.

url: URL of the author's profile.

followers: Number of followers of the author.

following: Number of accounts the author is following.

profilePicture: URL of the author's profile picture.

coverPicture: URL of the author's cover picture.

description: Description or bio of the author.

location: Location of the author.

createdAt: Date and time when the author's account was created.

entities: Entities present in the tweet, such as hashtags, symbols, URLs, and user mentions.

isRetweet: Indicates whether the tweet is a retweet.

isQuote: Indicates whether the tweet is a quote.

quote: Information about the quoted tweet, if this tweet is a quote.

media: Information about any media (such as images or videos) attached to the tweet.

This dataset can be analyzed to gain insights into trends, sentiments, and user behavior on Twitter. You can use Python libraries like pandas to load this dataset and perform various analyses and visualizations.

Facebook

Twitter

Click to copy link

Link copied

Cite

Thought Vector (2017). Customer Support on Twitter [Dataset]. https://www.kaggle.com/thoughtvector/customer-support-on-twitter

Customer Support on Twitter

Over 3 million tweets and replies from the biggest brands on Twitter

Explore at:

zip(176772673 bytes)Available download formats

Dataset updated

Dec 3, 2017

Dataset authored and provided by

Thought Vector

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.

https://i.imgur.com/nTv3Iuu.png" alt="Example Analysis - Inbound Volume for the Top 20 Brands">

Context

Natural language remains the densest encoding of human experience we have, and innovation in NLP has accelerated to power understanding of that data, but the datasets driving this innovation don't match the real language in use today. The Customer Support on Twitter dataset offers a large corpus of modern English (mostly) conversations between consumers and customer support agents on Twitter, and has three important advantages over other conversational text datasets:

Focused - Consumers contact customer support to have a specific problem solved, and the manifold of problems to be discussed is relatively small, especially compared to unconstrained conversational datasets like the reddit Corpus.
Natural - Consumers in this dataset come from a much broader segment than those in the Ubuntu Dialogue Corpus and have much more natural and recent use of typed text than the Cornell Movie Dialogs Corpus.
Succinct - Twitter's brevity causes more natural responses from support agents (rather than scripted), and to-the-point descriptions of problems and solutions. Also, its convenient in allowing for a relatively low message limit size for recurrent nets.

Inspiration

The size and breadth of this dataset inspires many interesting questions:

Can we predict company responses? Given the bounded set of subjects handled by each company, the answer seems like yes!
Do requests get stale? How quickly do the best companies respond, compared to the worst?
Can we learn high quality dense embeddings or representations of similarity for topical clustering?
How does tone affect the customer support conversation? Does saying sorry help?
Can we help companies identify new problems, or ones most affecting their customers?

Acknowledgements

Dataset built with PointScrape.

Content

The dataset is a CSV, where each row is a tweet. The different columns are described below. Every conversation included has at least one request from a consumer and at least one response from a company. Which user IDs are company user IDs can be calculated using the inbound field.

`tweet_id`

A unique, anonymized ID for the Tweet. Referenced by response_tweet_id and in_response_to_tweet_id.

`author_id`

A unique, anonymized user ID. @s in the dataset have been replaced with their associated anonymized user ID.

`inbound`

Whether the tweet is "inbound" to a company doing customer support on Twitter. This feature is useful when re-organizing data for training conversational models.

`created_at`

Date and time when the tweet was sent.

`text`

Tweet content. Sensitive information like phone numbers and email addresses are replaced with mask values like _email_.

`response_tweet_id`

IDs of tweets that are responses to this tweet, comma-separated.

`in_response_to_tweet_id`

ID of the tweet this tweet is in response to, if any.

Contributing

Know of other brands the dataset should include? Found something that needs to be fixed? Start a discussion, or email me directly at $FIRSTNAME@$LASTNAME.com!

Acknowledgements

A huge thank you to my friends who helped bootstrap the list of companies that do customer support on Twitter! There are many rocks that would have been left un-turned were it not for your suggestions!

Relevant Resources

Licensing

For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.

Clear search

Close search

Google apps

Main menu

Customer Support on Twitter

Context

Inspiration

Acknowledgements

Content

tweet_id

author_id

inbound

created_at

text

response_tweet_id

in_response_to_tweet_id

Contributing

Acknowledgements

Relevant Resources

Licensing

Customer Support on Twitter

Dataset

Contents

Customer Support on Twitter

Customer Support Twitter Data

Dataset

Contents

Twitter Tweets Sentiment Dataset

Description:

Columns:

Acknowledgement:

Objective:

Twitter customer support twitter llm finetune

Dataset

Contents

Customer Support Tweets (945M rows)

Dataset

Contents

Saudi Customer Care Tweets

Support data for Chatbots

File Description

Column Description

Twitter Airline Sentiment Dataset

Overview

What's Included

Purpose of This Dataset

Key Use Cases

Source

Notes

customer care tweets KSA

- this data contains 10000 tweets for a telecom company's customer care account on Twitter.

- this data need to use in Sentiment Analysis in Arabic.

The Climate Change Twitter Dataset

Sentiment Analysis Mental Health Tweets 2017-2023

twitter-news

US Airlines Twitter (Over time)

US Airlines Twitter (Over time)

Study the trend customer satisfaction over time

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

COVID-19 Twitter Dataset

Context

Content

Acknowledgements

Inspiration

Rights

Data from: Twitter Data

Bank customer tweets (10000)

Twitter-Bot Detection Dataset

Twitter New Dataset 2024 March Data

Customer Support on Twitter

Over 3 million tweets and replies from the biggest brands on Twitter

Context

Inspiration

Acknowledgements

Content

tweet_id

author_id

inbound

created_at

`tweet_id`

`author_id`

`inbound`

`created_at`

`text`

`response_tweet_id`

`in_response_to_tweet_id`

`tweet_id`

`author_id`

`inbound`

`created_at`

`text`

`response_tweet_id`

`in_response_to_tweet_id`