49 datasets found
  1. Customer Support on Twitter

    • kaggle.com
    zip
    Updated Dec 3, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thought Vector (2017). Customer Support on Twitter [Dataset]. https://www.kaggle.com/thoughtvector/customer-support-on-twitter
    Explore at:
    zip(176772673 bytes)Available download formats
    Dataset updated
    Dec 3, 2017
    Dataset authored and provided by
    Thought Vector
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.

    https://i.imgur.com/nTv3Iuu.png" alt="Example Analysis - Inbound Volume for the Top 20 Brands">

    Context

    Natural language remains the densest encoding of human experience we have, and innovation in NLP has accelerated to power understanding of that data, but the datasets driving this innovation don't match the real language in use today. The Customer Support on Twitter dataset offers a large corpus of modern English (mostly) conversations between consumers and customer support agents on Twitter, and has three important advantages over other conversational text datasets:

    • Focused - Consumers contact customer support to have a specific problem solved, and the manifold of problems to be discussed is relatively small, especially compared to unconstrained conversational datasets like the reddit Corpus.
    • Natural - Consumers in this dataset come from a much broader segment than those in the Ubuntu Dialogue Corpus and have much more natural and recent use of typed text than the Cornell Movie Dialogs Corpus.
    • Succinct - Twitter's brevity causes more natural responses from support agents (rather than scripted), and to-the-point descriptions of problems and solutions. Also, its convenient in allowing for a relatively low message limit size for recurrent nets.

    Inspiration

    The size and breadth of this dataset inspires many interesting questions:

    • Can we predict company responses? Given the bounded set of subjects handled by each company, the answer seems like yes!
    • Do requests get stale? How quickly do the best companies respond, compared to the worst?
    • Can we learn high quality dense embeddings or representations of similarity for topical clustering?
    • How does tone affect the customer support conversation? Does saying sorry help?
    • Can we help companies identify new problems, or ones most affecting their customers?

    Acknowledgements

    Dataset built with PointScrape.

    Content

    The dataset is a CSV, where each row is a tweet. The different columns are described below. Every conversation included has at least one request from a consumer and at least one response from a company. Which user IDs are company user IDs can be calculated using the inbound field.

    tweet_id

    A unique, anonymized ID for the Tweet. Referenced by response_tweet_id and in_response_to_tweet_id.

    author_id

    A unique, anonymized user ID. @s in the dataset have been replaced with their associated anonymized user ID.

    inbound

    Whether the tweet is "inbound" to a company doing customer support on Twitter. This feature is useful when re-organizing data for training conversational models.

    created_at

    Date and time when the tweet was sent.

    text

    Tweet content. Sensitive information like phone numbers and email addresses are replaced with mask values like _email_.

    response_tweet_id

    IDs of tweets that are responses to this tweet, comma-separated.

    in_response_to_tweet_id

    ID of the tweet this tweet is in response to, if any.

    Contributing

    Know of other brands the dataset should include? Found something that needs to be fixed? Start a discussion, or email me directly at $FIRSTNAME@$LASTNAME.com!

    Acknowledgements

    A huge thank you to my friends who helped bootstrap the list of companies that do customer support on Twitter! There are many rocks that would have been left un-turned were it not for your suggestions!

    Relevant Resources

    Licensing

    For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.

  2. Customer Support on Twitter

    • kaggle.com
    zip
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amin Aslami (2024). Customer Support on Twitter [Dataset]. https://www.kaggle.com/datasets/aminaslam/customer-support-on-twitter
    Explore at:
    zip(78948 bytes)Available download formats
    Dataset updated
    Oct 17, 2024
    Authors
    Amin Aslami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Amin Aslami

    Released under Apache 2.0

    Contents

  3. Customer Support on Twitter

    • berd-platform.de
    csv
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stuart Axelbrooke; Stuart Axelbrooke (2025). Customer Support on Twitter [Dataset]. http://doi.org/10.34740/kaggle/dsv/8841
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Stuart Axelbrooke; Stuart Axelbrooke
    Time period covered
    Mar 12, 2017
    Description

    The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact. The dataset includes replies of companies like Apple, Amazon, Uber, Delta, Spotify and others.

  4. Customer Support Twitter Data

    • kaggle.com
    zip
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Asif (2025). Customer Support Twitter Data [Dataset]. https://www.kaggle.com/datasets/muhammadasif786/customer-support-twitter-data
    Explore at:
    zip(176765850 bytes)Available download formats
    Dataset updated
    Aug 29, 2025
    Authors
    Muhammad Asif
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Muhammad Asif

    Released under Apache 2.0

    Contents

  5. Twitter Tweets Sentiment Dataset

    • kaggle.com
    zip
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
    Explore at:
    zip(1289519 bytes)Available download formats
    Dataset updated
    Apr 8, 2022
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

    Description:

    Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

    Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

    Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

    You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

    Columns:

    1. textID - unique ID for each piece of text
    2. text - the text of the tweet
    3. sentiment - the general sentiment of the tweet

    Acknowledgement:

    The dataset is download from Kaggle Competetions:
    https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict the twitter sentiments.
    • Compare the evaluation metrics of vaious classification algorithms.
  6. Twitter customer support twitter llm finetune

    • kaggle.com
    zip
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Asif (2025). Twitter customer support twitter llm finetune [Dataset]. https://www.kaggle.com/datasets/muhammadasif786/twitter-customer-support-twitter-llm-finetune/suggestions
    Explore at:
    zip(176765850 bytes)Available download formats
    Dataset updated
    Sep 1, 2025
    Authors
    Muhammad Asif
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Muhammad Asif

    Released under Apache 2.0

    Contents

  7. Customer Support Tweets (945M rows)

    • kaggle.com
    zip
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Galal Qassas (2025). Customer Support Tweets (945M rows) [Dataset]. https://www.kaggle.com/datasets/galalqassas/customer-support-tweets-945m-rows
    Explore at:
    zip(74154613 bytes)Available download formats
    Dataset updated
    Oct 31, 2025
    Authors
    Galal Qassas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Galal Qassas

    Released under MIT

    Contents

  8. Saudi Customer Care Tweets

    • kaggle.com
    zip
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullah Alsharif (2024). Saudi Customer Care Tweets [Dataset]. https://www.kaggle.com/datasets/alshreefabdullh/saudi-customer-care-tweets
    Explore at:
    zip(10030314 bytes)Available download formats
    Dataset updated
    Mar 13, 2024
    Authors
    Abdullah Alsharif
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Saudi Arabia
    Description

    This data was collected from several customer care accounts as inquiries of the customers.

    "fullText": This variable contains the full-text content of the tweet. "lang": This variable indicates the language in which the tweet is written. "viewsCount": This variable represents the count of views or impressions the tweet has received. "bookmarkCount": This variable represents the count of times the tweet has been bookmarked by users. "favoriteCount": This variable represents the count of times the tweet has been favorited by users. "replyCount": This variable represents the count of replies the tweet has received. "retweetCount": This variable represents the count of times the tweet has been retweeted by users. "quoteCount": This variable represents the count of times the tweet has been quoted by users.

  9. Support data for Chatbots

    • kaggle.com
    zip
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Faizan (2025). Support data for Chatbots [Dataset]. https://www.kaggle.com/datasets/mohammadfaizannaeem/3m-tweet-data-of-world-biggest-brands-on-twitter/data
    Explore at:
    zip(176765850 bytes)Available download formats
    Dataset updated
    Feb 26, 2025
    Authors
    Mohammad Faizan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    File Description

    This dataset contains Twitter support conversations collected from various company accounts. It includes customer inquiries and corresponding support responses. The data is useful for training AI chatbots, analyzing customer service trends, and developing sentiment analysis models.

    Column Description

    This dataset contains customer support interactions on Twitter. It includes the following columns: tweet_id: A unique identifier for each tweet. author_id: The unique ID of the user who posted the tweet. inbound: A boolean value indicating whether the tweet is from a customer (True) or from the support team (False). created_at: The timestamp of when the tweet was posted (in UTC format). text: The content of the tweet. response_tweet_id: The unique ID of the response tweet, if applicable. in_response_to_tweet_id: The ID of the original tweet to which this tweet is responding.

    How This Data Can Be Used? Training a chatbot: Helps in generating automated support responses. Sentiment analysis: Can analyze whether tweets are complaints, queries, or feedback. Conversation tracking: By linking response tweets with original messages.

    originalAuthor : MANORAMA Source : https://www.kaggle.com/datasets/manovirat/aspect/data

    Note: This dataset is shared for educational and research purposes only.

  10. Twitter Airline Sentiment Dataset

    • kaggle.com
    zip
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandana Ramakrishna (2025). Twitter Airline Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/chandana890/twitter-airline-sentiment-dataset
    Explore at:
    zip(1134990 bytes)Available download formats
    Dataset updated
    Nov 14, 2025
    Authors
    Chandana Ramakrishna
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    This dataset contains tweets related to major US airlines and is widely used for NLP and sentiment analysis tasks. Each record includes the tweet text, timestamp, airline name, and sentiment label (positive, negative, neutral). This uploaded version is prepared to support advanced text processing, machine learning, and anomaly detection experiments.

    What's Included

    • Tweets.csv – Full collection of airline-related tweets
    • Text content suitable for NLP tasks
    • Timestamp information (useful for time-based analysis)
    • Sentiment labels for classification and evaluation
    • Cleaned text field for direct use in ML pipelines

    Purpose of This Dataset

    This dataset is used in a machine learning workflow focused on: - sentiment analysis
    - embedding generation (transformers)
    - dimensionality reduction (PCA, UMAP)
    - clustering and visualization
    - unsupervised anomaly detection using Isolation Forest

    It is especially suited for exploring changes in public sentiment, event detection, and contextual analysis in social media data.

    Key Use Cases

    • Building and testing NLP models
    • Semantic similarity and embedding-based analysis
    • Sentiment classification
    • Detecting anomalous posts or time periods
    • Visualizing tweet clusters using UMAP
    • Studying customer feedback patterns in the airline industry

    Source

    Originally derived from the Twitter US Airline Sentiment dataset on Kaggle.
    This uploaded version is intended for educational, analytical, and research purposes.

    Notes

    If you're using this dataset in a notebook, ensure you update your file path accordingly: ```python df = pd.read_csv("/kaggle/input/twitter-airline-sentiment-dataset/Tweets.csv")

  11. customer care tweets KSA

    • kaggle.com
    zip
    Updated May 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mansour (2022). customer care tweets KSA [Dataset]. https://www.kaggle.com/datasets/mansourhussain/customer-care-tweets-ksa/data
    Explore at:
    zip(642212 bytes)Available download formats
    Dataset updated
    May 20, 2022
    Authors
    Mansour
    Area covered
    السعودية
    Description

    - this data contains 10000 tweets for a telecom company's customer care account on Twitter.

    - this data need to use in Sentiment Analysis in Arabic.

  12. The Climate Change Twitter Dataset

    • kaggle.com
    • data.mendeley.com
    zip
    Updated May 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/deffro/the-climate-change-twitter-dataset
    Explore at:
    zip(428878019 bytes)Available download formats
    Dataset updated
    May 26, 2022
    Authors
    Dimitrios Effrosynidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use the dataset, cite the papers: https://doi.org/10.1016/j.eswa.2022.117541 and https://doi.org/10.1371/journal.pone.0274213

    The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

    The following columns are in the dataset:

    ➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

    Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.

  13. Sentiment Analysis Mental Health Tweets 2017-2023

    • kaggle.com
    zip
    Updated Apr 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zee M (2023). Sentiment Analysis Mental Health Tweets 2017-2023 [Dataset]. https://www.kaggle.com/datasets/zoegreenslade/twittermhcampaignsentmentanalysis
    Explore at:
    zip(751228815 bytes)Available download formats
    Dataset updated
    Apr 5, 2023
    Authors
    Zee M
    Description

    These datasets contain tweets from four mental health campaigns on Twitter, between the years of 2017-2023.

    The datasets have accompanying notebooks for each stage of the analysis:

    • 1. Scraping tweets from Twitter. Outputs --> ('UMHD', 'OCD', 'EDAW', 'MHAW')
    • 2.EDA, and merging the data together. Output --> ('MH_Campaigns_1723')
    • 3. Cleaning the tweets. Output --> ('MH_Campaign_Tweets_Clean_1723')
    • 4. Word Clouds, Visualisations and tweet preprocessing for Sentiment Analysis. Output --> ('MH_Campaign_Tweets_Tokenised_1723')
    • 5. VADER sentiment Analysis. Output --> ('MH_Campaign_Tweets_Sentiment_Scored_1723')

    They are broken down in this way so that you can practice the project from any stage - if you don't want to do the scraping but do want to do visuals, for example, you can begin at stage 4 with the relevant dataset.

    To see a presentation of the main insights that I pulled out of this data, follow this link: bit.ly/KaggleTMHC

    Also available on GitHub: https://github.com/zeehama/Sentiment-Analysis-on-4-Mental-Health-Campaigns-Twitter-

  14. twitter-news

    • kaggle.com
    zip
    Updated Aug 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Guy (2022). twitter-news [Dataset]. https://www.kaggle.com/datasets/deeguy/twitter-news
    Explore at:
    zip(1299040415 bytes)Available download formats
    Dataset updated
    Aug 17, 2022
    Authors
    Data Guy
    Description

    A collection of tweets scraped from Twitter since January 2020 using the search parameter "news". The resulting json file was then separated into 2 separate .csv files. One contains the tweets, whereas the other contains the network analysis inputs.

    The associated network analysis file is a document containing all the nodes and edges derived from the interactions in the tweets as follows:

    Nodes, all distinct tweeters including mentions
    Edges, defined as when one user mentions another user in a tweet or replies
    Weight, number of time the edge interaction has taken place
    

    Here is some code to get started- https://github.com/datadoctor100/twitter_analysis

  15. US Airlines Twitter (Over time)

    • kaggle.com
    zip
    Updated Nov 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). US Airlines Twitter (Over time) [Dataset]. https://www.kaggle.com/datasets/thedevastator/sentiment-analysis-of-us-airline-twitter-data
    Explore at:
    zip(1130886 bytes)Available download formats
    Dataset updated
    Nov 18, 2022
    Authors
    The Devastator
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    US Airlines Twitter (Over time)

    Study the trend customer satisfaction over time

    About this dataset

    The columns in the dataset include index, unit id, golden, unit state, trusted judgments, last judgment at, airline sentiment, airline sentiment confidence, negative reason, negative reason confidence, airline_sentiment_gold and retweet count. There is also text included for each tweet as well as tweet location and user timezone.

    Using this dataset, you can get a feel for how customers of various airlines feel about their service. You can use the data to analyze trends over time or compare different airlines. Some research ideas include using airline sentiment to predict the stock market or using the negativereason data to help airlines improve their customer service

    How to use the dataset

    Looking at this dataset, you can get a feel for how customers of various airlines feel about their service. The data includes the airline, the tweet text, the date of the tweet, and various other information. You can use this to analyze trends over time or compare different airlines

    Research Ideas

    • Using airline sentiment to predict the stock market - is there a correlation between how the public perceives an airline and how that airline's stock performs?
    • Using negativereason data to help airlines improve their customer service - which negative reasons are mentioned most often? Are there certain airlines that are consistently mentioned for specific reasons?
    • Use the tweet data to map out airline hot spots - where do people tend to tweet about certain airlines the most? Is there a geographic pattern to sentiment about specific airlines?

    Acknowledgements

    If you use this dataset in your research, please credit Social Media Data

    License

    License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) - You are free to: - Share - copy and redistribute the material in any medium or format for non-commercial purposes only. - Adapt - remix, transform, and build upon the material for non-commercial purposes only. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - You may not: - Use the material for commercial purposes.

    Columns

    File: Airline-Sentiment-2-w-AA.csv | Column name | Description | |:---------------------------|:-----------------------------------------------------------------------------| | _golden | This column is the gold standard column. (Boolean) | | _unit_state | This column is the state of the unit. (String) | | _trusted_judgments | This column is the number of trusted judgments. (Numeric) | | _last_judgment_at | This column is the timestamp of the last judgment. (String) | | airline_sentiment | This column is the sentiment of the tweet. (String) | | negativereason | This column is the negative reason for the sentiment. (String) | | airline_sentiment_gold | This column is the gold standard sentiment of the tweet. (String) | | name | This column is the name of the airline. (String) | | negativereason_gold | This column is the gold standard negative reason for the sentiment. (String) | | retweet_count | This column is the number of retweets. (Numeric) | | text | This column is the text of the tweet. (String) | | tweet_coord | This column is the coordinates of the tweet. (String) | | tweet_created | This column is the timestamp of the tweet. (String) | | tweet_location | This column is the location of the tweet. (String) | | user_timezone | This column is the timezone of the user. (String) |

  16. COVID-19 Twitter Dataset

    • kaggle.com
    zip
    Updated Jul 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jingli SHI (2020). COVID-19 Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/shijingli/covid19-twitter-dataset
    Explore at:
    zip(29449949 bytes)Available download formats
    Dataset updated
    Jul 4, 2020
    Authors
    Jingli SHI
    Description

    Context

    There are total of 20 CSV files including tweets related to COVID-19 from 20 March 2020 to 08 April 2020.

    Content

    For each file, the following columns are included. Columns: coordinates, created_at, hashtags, media, urls, favorite count, id, in_reply_to_screen_name, in_reply_to_status_id, in_reply_to_user_id.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    Rights

    The relevant sections of Twitter's Terms of Service [1] and Developer Agreement [2]. ** According to Twitter's Developer Policy §6 [3]: "If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs" and "any Content provided to third parties via non-automated file download remains subject to this Policy". [1] https://twitter.com/tos?lang=en [2] https://dev.twitter.com/overview/terms/agreement [3] https://dev.twitter.com/overview/terms/policy#6.Update_Be_a_Good_Partner_to_Twitter

  17. Data from: Twitter Data

    • kaggle.com
    zip
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akarsh kumar (2025). Twitter Data [Dataset]. https://www.kaggle.com/datasets/akarsh8/twitter-data
    Explore at:
    zip(204 bytes)Available download formats
    Dataset updated
    Jun 15, 2025
    Authors
    Akarsh kumar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Introducing Twitter dataset to help you access valuable Twitter data with the help of a powerful Twttr API with endpoints. You can easily retrieve twitter tweet details; twitter user followers and twitter followings; post likes, comments; quoted tweets, and retweets. You can also search for top, latest, videos, photos, and people, and access user tweets, replies, media, likes, and info by username or ID.

  18. Bank customer tweets (10000)

    • kaggle.com
    zip
    Updated Sep 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bayode Ogunleye (2022). Bank customer tweets (10000) [Dataset]. https://www.kaggle.com/datasets/batoog/bank-customer-tweets-10000
    Explore at:
    zip(563396 bytes)Available download formats
    Dataset updated
    Sep 25, 2022
    Authors
    Bayode Ogunleye
    Description

    If you use this dataset, Please ensure you reference accordingly. Kindly see reference below.

    Ogunleye, B. O. (2021). Statistical learning approaches to sentiment analysis in the Nigerian banking context (Doctoral dissertation, Sheffield Hallam University).

  19. Twitter-Bot Detection Dataset

    • kaggle.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Goyal (2023). Twitter-Bot Detection Dataset [Dataset]. https://www.kaggle.com/datasets/goyaladi/twitter-bot-detection-dataset
    Explore at:
    zip(3083151 bytes)Available download formats
    Dataset updated
    May 31, 2023
    Authors
    Aditya Goyal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Welcome to the Bot Detect Dataset! This dataset offers a unique opportunity to delve into the world of Twitter bots. Explore user profiles, tweet content, retweet counts, and more. Uncover hidden patterns and gain insights into bot detection research. Join us on this exciting journey of understanding social media interactions and identifying bot accounts.

  20. Twitter New Dataset 2024 March Data

    • kaggle.com
    zip
    Updated Mar 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Kumar Singh (2024). Twitter New Dataset 2024 March Data [Dataset]. https://www.kaggle.com/datasets/fastcurious/twitter-new-dataset-2024-march-data
    Explore at:
    zip(2923762 bytes)Available download formats
    Dataset updated
    Mar 11, 2024
    Authors
    Ayush Kumar Singh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Tweets scraped will all possible datapoints provided by twitter in each tweet. For data extraction or scraping contact me on telegram - @akaseobhw

    All datapoints present for each tweet.

    Each entry in the dataset represents a tweet along with various attributes such as the tweet's ID, URL, text content, retweet count, reply count, like count, quote count, view count, creation date, language, and more. Additionally, there are details about the tweet's author, including their username, profile URL, follower count, following count, profile picture, cover picture, description, location, creation date, and more.

    Here's a brief description of the key fields present in each tweet entry:

    • type: Indicates the type of data, in this case, it's a tweet.
    • id: Unique identifier for the tweet.
    • url: URL of the tweet.
    • twitterUrl: Twitter URL of the tweet.
    • text: Text content of the tweet.
    • retweetCount: Number of retweets.
    • replyCount: Number of replies.
    • likeCount: Number of likes (favorites).
    • quoteCount: Number of times the tweet has been quoted.
    • viewCount: Number of views.
    • createdAt: Date and time when the tweet was created.
    • lang: Language of the tweet.
    • quoteId: ID of the quoted tweet, if this tweet is a quote.
    • bookmarkCount: Number of times the tweet has been bookmarked.
    • isReply: Indicates whether the tweet is a reply to another tweet.
    • author: Information about the author of the tweet.
      • userName: Username of the author.
      • url: URL of the author's profile.
      • followers: Number of followers of the author.
      • following: Number of accounts the author is following.
      • profilePicture: URL of the author's profile picture.
      • coverPicture: URL of the author's cover picture.
      • description: Description or bio of the author.
      • location: Location of the author.
      • createdAt: Date and time when the author's account was created.
    • entities: Entities present in the tweet, such as hashtags, symbols, URLs, and user mentions.
    • isRetweet: Indicates whether the tweet is a retweet.
    • isQuote: Indicates whether the tweet is a quote.
    • quote: Information about the quoted tweet, if this tweet is a quote.
    • media: Information about any media (such as images or videos) attached to the tweet.

    This dataset can be analyzed to gain insights into trends, sentiments, and user behavior on Twitter. You can use Python libraries like pandas to load this dataset and perform various analyses and visualizations.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Thought Vector (2017). Customer Support on Twitter [Dataset]. https://www.kaggle.com/thoughtvector/customer-support-on-twitter
Organization logo

Customer Support on Twitter

Over 3 million tweets and replies from the biggest brands on Twitter

Explore at:
zip(176772673 bytes)Available download formats
Dataset updated
Dec 3, 2017
Dataset authored and provided by
Thought Vector
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact.

https://i.imgur.com/nTv3Iuu.png" alt="Example Analysis - Inbound Volume for the Top 20 Brands">

Context

Natural language remains the densest encoding of human experience we have, and innovation in NLP has accelerated to power understanding of that data, but the datasets driving this innovation don't match the real language in use today. The Customer Support on Twitter dataset offers a large corpus of modern English (mostly) conversations between consumers and customer support agents on Twitter, and has three important advantages over other conversational text datasets:

  • Focused - Consumers contact customer support to have a specific problem solved, and the manifold of problems to be discussed is relatively small, especially compared to unconstrained conversational datasets like the reddit Corpus.
  • Natural - Consumers in this dataset come from a much broader segment than those in the Ubuntu Dialogue Corpus and have much more natural and recent use of typed text than the Cornell Movie Dialogs Corpus.
  • Succinct - Twitter's brevity causes more natural responses from support agents (rather than scripted), and to-the-point descriptions of problems and solutions. Also, its convenient in allowing for a relatively low message limit size for recurrent nets.

Inspiration

The size and breadth of this dataset inspires many interesting questions:

  • Can we predict company responses? Given the bounded set of subjects handled by each company, the answer seems like yes!
  • Do requests get stale? How quickly do the best companies respond, compared to the worst?
  • Can we learn high quality dense embeddings or representations of similarity for topical clustering?
  • How does tone affect the customer support conversation? Does saying sorry help?
  • Can we help companies identify new problems, or ones most affecting their customers?

Acknowledgements

Dataset built with PointScrape.

Content

The dataset is a CSV, where each row is a tweet. The different columns are described below. Every conversation included has at least one request from a consumer and at least one response from a company. Which user IDs are company user IDs can be calculated using the inbound field.

tweet_id

A unique, anonymized ID for the Tweet. Referenced by response_tweet_id and in_response_to_tweet_id.

author_id

A unique, anonymized user ID. @s in the dataset have been replaced with their associated anonymized user ID.

inbound

Whether the tweet is "inbound" to a company doing customer support on Twitter. This feature is useful when re-organizing data for training conversational models.

created_at

Date and time when the tweet was sent.

text

Tweet content. Sensitive information like phone numbers and email addresses are replaced with mask values like _email_.

response_tweet_id

IDs of tweets that are responses to this tweet, comma-separated.

in_response_to_tweet_id

ID of the tweet this tweet is in response to, if any.

Contributing

Know of other brands the dataset should include? Found something that needs to be fixed? Start a discussion, or email me directly at $FIRSTNAME@$LASTNAME.com!

Acknowledgements

A huge thank you to my friends who helped bootstrap the list of companies that do customer support on Twitter! There are many rocks that would have been left un-turned were it not for your suggestions!

Relevant Resources

Licensing

For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.

Search
Clear search
Close search
Google apps
Main menu