40 datasets found
  1. Customer Support on Twitter

    • kaggle.com
    zip
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amin Aslami (2024). Customer Support on Twitter [Dataset]. https://www.kaggle.com/datasets/aminaslam/customer-support-on-twitter
    Explore at:
    zip(78948 bytes)Available download formats
    Dataset updated
    Oct 17, 2024
    Authors
    Amin Aslami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Amin Aslami

    Released under Apache 2.0

    Contents

  2. Twitter Tweets Sentiment Dataset

    • kaggle.com
    zip
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
    Explore at:
    zip(1289519 bytes)Available download formats
    Dataset updated
    Apr 8, 2022
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

    Description:

    Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

    Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

    Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

    You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

    Columns:

    1. textID - unique ID for each piece of text
    2. text - the text of the tweet
    3. sentiment - the general sentiment of the tweet

    Acknowledgement:

    The dataset is download from Kaggle Competetions:
    https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict the twitter sentiments.
    • Compare the evaluation metrics of vaious classification algorithms.
  3. Customer Support on Twitter

    • berd-platform.de
    csv
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stuart Axelbrooke; Stuart Axelbrooke (2025). Customer Support on Twitter [Dataset]. http://doi.org/10.34740/kaggle/dsv/8841
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Stuart Axelbrooke; Stuart Axelbrooke
    Time period covered
    Mar 12, 2017
    Description

    The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact. The dataset includes replies of companies like Apple, Amazon, Uber, Delta, Spotify and others.

  4. Support data for Chatbots

    • kaggle.com
    zip
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Faizan (2025). Support data for Chatbots [Dataset]. https://www.kaggle.com/datasets/mohammadfaizannaeem/3m-tweet-data-of-world-biggest-brands-on-twitter/data
    Explore at:
    zip(176765850 bytes)Available download formats
    Dataset updated
    Feb 26, 2025
    Authors
    Mohammad Faizan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    File Description

    This dataset contains Twitter support conversations collected from various company accounts. It includes customer inquiries and corresponding support responses. The data is useful for training AI chatbots, analyzing customer service trends, and developing sentiment analysis models.

    Column Description

    This dataset contains customer support interactions on Twitter. It includes the following columns: tweet_id: A unique identifier for each tweet. author_id: The unique ID of the user who posted the tweet. inbound: A boolean value indicating whether the tweet is from a customer (True) or from the support team (False). created_at: The timestamp of when the tweet was posted (in UTC format). text: The content of the tweet. response_tweet_id: The unique ID of the response tweet, if applicable. in_response_to_tweet_id: The ID of the original tweet to which this tweet is responding.

    How This Data Can Be Used? Training a chatbot: Helps in generating automated support responses. Sentiment analysis: Can analyze whether tweets are complaints, queries, or feedback. Conversation tracking: By linking response tweets with original messages.

    originalAuthor : MANORAMA Source : https://www.kaggle.com/datasets/manovirat/aspect/data

    Note: This dataset is shared for educational and research purposes only.

  5. Apple Tweet Dataset

    • kaggle.com
    zip
    Updated Mar 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sup_tenshi (2022). Apple Tweet Dataset [Dataset]. https://www.kaggle.com/datasets/suptenshi/apple-tweet-dataset
    Explore at:
    zip(375032 bytes)Available download formats
    Dataset updated
    Mar 26, 2022
    Authors
    sup_tenshi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset can be used for Sentiment Analysis which contains the tweets about apple products on twitter. This data set has basically 3 headers 1. tweet_text 2.emotion_in_tweet_is_directed_at 3.is_there_an_emotion_directed_at_a_brand_or_product

  6. Twitter Customer Reviews of Popular Smart Phone

    • kaggle.com
    zip
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shibbir Ahmed Arif (2024). Twitter Customer Reviews of Popular Smart Phone [Dataset]. https://www.kaggle.com/datasets/shibbir282/twitter-customer-reviews-of-popular-smart-phone
    Explore at:
    zip(1236373 bytes)Available download formats
    Dataset updated
    Jun 8, 2024
    Authors
    Shibbir Ahmed Arif
    Description

    Context

    This dataset is a part of our research work titled "Opinion Mining of Customer Reviews Using Supervised Learning Algorithms". If you use this dataset then please cite our work. You can find the article in https://ieeexplore.ieee.org/document/9733435

    Content

    Nowadays, a lot of people express their opinions on various topics using social networking sites. Twitter has become a famous social networking site where people can express their opinions to the point and so it has become a great source for opinion mining. In this research, the goal was to train and build a model that can automatically and accurately categorize the opinion of customer tweet reviews about popular cell phone brands. We have used python TextBlob library for getting the polarity values of all the tweet reviews of the dataset. We have also used Support Vector Machine (SVM), Naïve Bayes, Logistic Regression, Decision Tree and Random Forest algorithms along with Bag of Words and TF-IDF vectorizers separately to train and build the model. We have investigated the opinions using five classes which are Strongly Positive, Positive, Neutral, Negative and Strongly Negative.

    When referencing this dataset please cite the below paper

    Bibtex @inproceedings{arif2021opinion, title={Opinion Mining of Customer Reviews Using Supervised Learning Algorithms}, author={Arif, Shibbir Ahmed and Hossain, Taslima Binte}, booktitle={2021 5th International Conference on Electrical Information and Communication Technology (EICT)}, pages={1--6}, year={2021}, organization={IEEE} }

  7. COVID-19 Twitter Dataset

    • kaggle.com
    zip
    Updated Jul 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jingli SHI (2020). COVID-19 Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/shijingli/covid19-twitter-dataset
    Explore at:
    zip(29449949 bytes)Available download formats
    Dataset updated
    Jul 4, 2020
    Authors
    Jingli SHI
    Description

    Context

    There are total of 20 CSV files including tweets related to COVID-19 from 20 March 2020 to 08 April 2020.

    Content

    For each file, the following columns are included. Columns: coordinates, created_at, hashtags, media, urls, favorite count, id, in_reply_to_screen_name, in_reply_to_status_id, in_reply_to_user_id.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    Rights

    The relevant sections of Twitter's Terms of Service [1] and Developer Agreement [2]. ** According to Twitter's Developer Policy §6 [3]: "If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs" and "any Content provided to third parties via non-automated file download remains subject to this Policy". [1] https://twitter.com/tos?lang=en [2] https://dev.twitter.com/overview/terms/agreement [3] https://dev.twitter.com/overview/terms/policy#6.Update_Be_a_Good_Partner_to_Twitter

  8. Twitter Dataset February 2024

    • kaggle.com
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Kumar Singh (2024). Twitter Dataset February 2024 [Dataset]. https://www.kaggle.com/datasets/fastcurious/twitter-dataset-february-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ayush Kumar Singh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Tweets scraped will all possible datapoints provided by twitter in each tweet. For data extraction or scraping contact me on telegram - @akaseobhw

    All datapoints present for each tweet.

    Each entry in the dataset represents a tweet along with various attributes such as the tweet's ID, URL, text content, retweet count, reply count, like count, quote count, view count, creation date, language, and more. Additionally, there are details about the tweet's author, including their username, profile URL, follower count, following count, profile picture, cover picture, description, location, creation date, and more.

    Here's a brief description of the key fields present in each tweet entry:

    • type: Indicates the type of data, in this case, it's a tweet.
    • id: Unique identifier for the tweet.
    • url: URL of the tweet.
    • twitterUrl: Twitter URL of the tweet.
    • text: Text content of the tweet.
    • retweetCount: Number of retweets.
    • replyCount: Number of replies.
    • likeCount: Number of likes (favorites).
    • quoteCount: Number of times the tweet has been quoted.
    • viewCount: Number of views.
    • createdAt: Date and time when the tweet was created.
    • lang: Language of the tweet.
    • quoteId: ID of the quoted tweet, if this tweet is a quote.
    • bookmarkCount: Number of times the tweet has been bookmarked.
    • isReply: Indicates whether the tweet is a reply to another tweet.
    • author: Information about the author of the tweet.
      • userName: Username of the author.
      • url: URL of the author's profile.
      • followers: Number of followers of the author.
      • following: Number of accounts the author is following.
      • profilePicture: URL of the author's profile picture.
      • coverPicture: URL of the author's cover picture.
      • description: Description or bio of the author.
      • location: Location of the author.
      • createdAt: Date and time when the author's account was created.
    • entities: Entities present in the tweet, such as hashtags, symbols, URLs, and user mentions.
    • isRetweet: Indicates whether the tweet is a retweet.
    • isQuote: Indicates whether the tweet is a quote.
    • quote: Information about the quoted tweet, if this tweet is a quote.
    • media: Information about any media (such as images or videos) attached to the tweet.

    This dataset can be analyzed to gain insights into trends, sentiments, and user behavior on Twitter. You can use Python libraries like pandas to load this dataset and perform various analyses and visualizations.

  9. Brand Sentiment Analysis Dataset (Twitter)

    • kaggle.com
    zip
    Updated Jan 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tushar Paul (2024). Brand Sentiment Analysis Dataset (Twitter) [Dataset]. https://www.kaggle.com/datasets/tusharpaul2001/brand-sentiment-analysis-dataset
    Explore at:
    zip(375745 bytes)Available download formats
    Dataset updated
    Jan 7, 2024
    Authors
    Tushar Paul
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset description Users assessed tweets related to various brands and products, providing evaluations on whether the sentiment conveyed was positive, negative, or neutral. Additionally, if the tweet conveyed any sentiment, contributors identified the specific brand or product targeted by that emotion.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fa48606bfcaf80acebbb6edff7895484a%2Fdownload.png?generation=1704673111671747&alt=media" alt="">

    Train Dataset : 8589 rows x 3 columns https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fe998ba81ca461699a787ff7305486b24%2FTrainDS.JPG?generation=1704672608361793&alt=media" alt="">

    Test Dataset : 504 rows x 1 columns https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2F07df18965e91f84df123270aabb641e1%2Ftest.JPG?generation=1704679582009718&alt=media" alt="">

  10. Bank customer tweets (10000)

    • kaggle.com
    Updated Jan 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bayode Ogunleye (2023). Bank customer tweets (10000) [Dataset]. https://www.kaggle.com/datasets/batoog/bank-customer-tweets-10000
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bayode Ogunleye
    Description

    If you use this dataset, Please ensure you reference accordingly. Kindly see reference below.

    Ogunleye, B. O. (2021). Statistical learning approaches to sentiment analysis in the Nigerian banking context (Doctoral dissertation, Sheffield Hallam University).

  11. Drug-related Tweets Dataset

    • kaggle.com
    zip
    Updated Sep 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techno Care (2025). Drug-related Tweets Dataset [Dataset]. https://www.kaggle.com/datasets/technocare/drug-related-tweets-dataset
    Explore at:
    zip(9522011 bytes)Available download formats
    Dataset updated
    Sep 17, 2025
    Authors
    Techno Care
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains drug-related text entries structured to resemble tweets. It was generated from the drugsComTest_raw.csv dataset, which originally included patient reviews of medications. Source: Extracted from patient-submitted reviews on Drugs.com. Format: CSV file with two columns: drugName – the name of the drug mentioned. tweet – the review text reformatted to simulate a tweet-like message.

    Purpose: To support Natural Language Processing (NLP) tasks such as sentiment analysis, drug-effect classification, and social media mining. To act as a proxy dataset for training or testing models on drug-related discussions, where actual Twitter data collection is restricted or unavailable.

    Limitations: Not real Twitter data, but synthetic tweets generated from formal drug reviews. May differ in tone and structure compared to actual tweets.

  12. Twitter Financial News

    • kaggle.com
    zip
    Updated Jan 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sulphatet (2023). Twitter Financial News [Dataset]. https://www.kaggle.com/datasets/sulphatet/twitter-financial-news
    Explore at:
    zip(1127820 bytes)Available download formats
    Dataset updated
    Jan 21, 2023
    Authors
    Sulphatet
    Description

    About the Data

    The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.

    Featured Notebooks:

    Your notebook here? 🚨 1. https://www.kaggle.com/code/ahmadalijamali/twitter-financial-news-nlp-analysis-and-prediction by @ Ahmadali Jamali

    Ideas:

    1. The data is a multi-label text classification problem with imbalanced data.
    2. The links within the texts could be extracted.
    3. EDA

    The dataset holds 21,107 documents annotated with 20 labels:

      "LABEL_0": "Analyst Update",
    
      "LABEL_1": "Fed | Central Banks",
    
      "LABEL_2": "Company | Product News",
    
      "LABEL_3": "Treasuries | Corporate Debt",
    
      "LABEL_4": "Dividend",
    
      "LABEL_5": "Earnings",
    
      "LABEL_6": "Energy | Oil",
    
      "LABEL_7": "Financials",
    
      "LABEL_8": "Currencies",
    
      "LABEL_9": "General News | Opinion",
    
      "LABEL_10": "Gold | Metals | Materials",
    
      "LABEL_11": "IPO",
    
      "LABEL_12": "Legal | Regulation",
    
      "LABEL_13": "M&A | Investments",
    
      "LABEL_14": "Macro",
    
      "LABEL_15": "Markets",
    
      "LABEL_16": "Politics",
    
      "LABEL_17": "Personnel Change",
    
      "LABEL_18": "Stock Commentary",
    
      "LABEL_19": "Stock Movement"
    

    The data was collected using the Twitter API. The current dataset supports the multi-class classification task.

    The training data has 16,990 instances, and the validation data has 4,118 instances.

  13. Job Vacancy Tweets

    • kaggle.com
    zip
    Updated Apr 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2023). Job Vacancy Tweets [Dataset]. https://www.kaggle.com/datasets/prasad22/job-vacancy-tweets
    Explore at:
    zip(5514498 bytes)Available download formats
    Dataset updated
    Apr 10, 2023
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 50,000 tweets related to job vacancies and hiring, extracted using the keywords 'Job Vacancy,' 'We are Hiring,' and 'We're Hiring'. The tweets were collected between January 1, 2019, and April 10, 2023, with the help of snscrape library of Python and are provided in a CSV format.

    The purpose behind this dataset

    • To explore text pre-processing and test NLP skills
    • Draw interesting insights on Job Market from Job Postings.
    • Analyse company/role requirements if possible

    The dataset includes the following information for each tweet: ID: The unique identifier for the tweet. Timestamp: The date and time when the tweet was posted. User: The Twitter handle of the user who posted the tweet. Text: The content of the tweet. Hashtag: The hashtags included in the tweet, if any. Retweets: The number of times the tweet has been retweeted as of the time it was scraped. Likes: The number of likes the tweet has received as of the time it was scraped. Replies: The number of replies to the tweet as of the time it was scraped. Source: The source application or device used to post the tweet. Location: The location listed on the user's Twitter profile, if any. Verified_Account: A Boolean value indicating whether the user's Twitter account has been verified. Followers: The number of followers the user has as of the time the tweet was scraped. Following: The number of accounts the user is following as of the time the tweet was scraped

  14. Covid-19 Twitter Dataset

    • kaggle.com
    zip
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arunava Kr. Chakraborty (2023). Covid-19 Twitter Dataset [Dataset]. https://www.kaggle.com/arunavakrchakraborty/covid19-twitter-dataset
    Explore at:
    zip(51063255 bytes)Available download formats
    Dataset updated
    Mar 13, 2023
    Authors
    Arunava Kr. Chakraborty
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Data Collection

    I streamed live tweets from the twitter after WHO declared Covid-19 as a pandemic. Since this Covid-19 epidemic has affected the entire world, I collected worldwide Covid-19 related English tweets at a rate of almost 10k per day in three phases starting from April-June, 2020, August-October, 2020 and April-June, 2021. I prepared the first phase dataset of about 235k tweets collected from 19th April to 20th June 2020. After one month I again start collecting tweets from Twitter as at that time the pandemic was spreading with its fatal intensity. I collected almost 320k tweets in the period August 20 to October 20, 2020, for the second phase dataset. Finally, after six months collected almost 489k tweets in the period 26th April to 27th June 2021 for the third phase dataset.

    Content

    The datasets I developed contain important information about most of the tweets and their attributes. The main attributes of both of these datasets are: - Tweet ID - Creation Date & Time - Source Link - Original Tweet - Favorite Count - Retweet Count - Original Author - Hashtags - User Mentions - Place

    Finally, I collected 2,35,240, 3,20,316, and 4,89,269 tweets for first, second, and third phase datasets containing the hash-tagged keywords like - #covid-19, #coronavirus, #covid, #covaccine, #lockdown, #homequarantine, #quarantinecenter, #socialdistancing, #stayhome, #staysafe, etc. Here I represented an overview of the collected dataset.

    Data Pre-Processing

    I pre-processed these collected data by developing a user-defined pre-processing function based on NLTK (Natural Language Toolkit, a Python library for NLP). At the initial stage, it converts all the tweets into lowercase. Then it removes all extra white spaces, numbers, special characters, ASCII characters, URLs, punctuations & stopwords from the tweets. Then it converts all ‘covid’ words into ‘covid19’ as we already removed all numbers from the tweets. Using stemming the pre-processing function has reduced inflected words to their word stem.

    Sentiment Analysis

    I calculated the sentiment polarity of each cleaned and pre-processed tweet using the NLTK-based Sentiment Analyzer and get the sentiment scores for positive, negative, and neutral categories to calculate the compound sentiment score for each tweet. I classified the tweets on the basis of the compound sentiment scores into three different classes i.e., Positive, Negative, and Neutral. Then we assigned the sentiment polarity ratings for each tweet based on the following algorithm-

    Algorithm Sentiment Classification of Tweets (compound, sentiment): 1. for each tweet in the dataset: 2. if tweet[compound] < 0: 3. tweet[sentiment] = 0.0 # assigned 0.0 for Negative Tweets 4. elif tweet[compound] > 0: 5. tweet[sentiment] = 1.0 # assigned 1.0 for Positive Tweets 6. else: 7. tweet[sentiment] = 0.5 # assigned 0.5 for Neutral Tweets 8. end

    Acknowledgements

    I wouldn't be here without the help of my project guide Dr. Anup Kumar Kolya, Assistant Professor, Dept of Computer Science and Engineering, RCCIIT whose kind and valuable suggestions and excellent guidance enlightened to give me the best opportunity in preparing these datasets. If you owe any attributions or thanks, include him here along with any citations of past research.

    This datasets are the part of the publications entitled:

    • Chakraborty, A. K., Das, D., & Kolya, A. K. (2023). Sentiment Analysis on Large-Scale Covid-19 Tweets using Hybrid Convolutional LSTM Based on Naïve Bayes Sentiment Modeling. ECTI Transactions on Computer and Information Technology (ECTI-CIT), 17(3), 343–357. https://doi.org/10.37936/ecti-cit.2023173.252549
    • Chakraborty, A. K., & Das, S. (2023). A comparative study of a novel approach with baseline attributes leading to sentiment analysis of Covid-19 tweets. In Elsevier eBooks (pp. 179–208). https://doi.org/10.1016/b978-0-32-390535-0.00013-6
    • Chakraborty, A. K., Das, S., & Kolya, A. K. (2021). Sentiment analysis of COVID-19 tweets using Evolutionary Classification-Based LSTM model. In Advances in intelligent systems and computing (pp. 75–86). https://doi.org/10.1007/978-981-16-1543-6_7
  15. Sentiment with 1.6 million tweets with locations

    • kaggle.com
    zip
    Updated Mar 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vivek chary (2023). Sentiment with 1.6 million tweets with locations [Dataset]. https://www.kaggle.com/datasets/vivekchary/sentiment-with-16-million-tweets-with-locations
    Explore at:
    zip(86959692 bytes)Available download formats
    Dataset updated
    Mar 12, 2023
    Authors
    vivek chary
    Description

    The "Sentiment with 16 million tweets with locations" dataset is a collection of tweets with their respective geographical location information and sentiment labels. The dataset includes 16 million tweets from various locations around the world, spanning a period of several years. The sentiment labels for each tweet are binary, indicating whether the sentiment expressed in the tweet is positive or negative.

    This dataset can be used for sentiment analysis and natural language processing tasks, such as training machine learning models to classify the sentiment of text data. Researchers and developers can use this dataset to analyze trends in sentiment across different locations and time periods, as well as to develop new algorithms and models for sentiment analysis.

    Please note that this dataset is intended for research purposes only and should not be used for any commercial or legal applications. The dataset may also contain offensive or inappropriate language, and users should exercise caution when working with this data

    Context In addition to the technical details of the "Sentiment with 16 million tweets with locations" dataset, some context that may be relevant to include in the About Dataset section could be:

    • The dataset was compiled and made publicly available by Vivek Chary, a data scientist and machine learning engineer.
    • The tweets were collected using the Twitter API, and the dataset was last updated in 2017.
    • The dataset includes tweets in various languages, although the majority are in English.
    • Sentiment analysis is a common application of natural language processing, and has a wide range of potential use cases, such as in market research, social media monitoring, and customer service.
    • Sentiment analysis can be challenging due to the complexity and ambiguity of language, as well as the variability of individual expression and context.

    • Large datasets like this one are important for developing accurate and robust sentiment analysis models, as they provide a diverse and representative sample of real-world text data.

    Content It contains the following 7 fields:

    1. Sentiment Target: The polarity of the tweet, indicated by a numeric value of 0 (negative), 2 (neutral), or 4 (positive).

    2. Tweet ID: The unique identifier of the tweet.

    3. Date: The date and time the tweet was posted in Coordinated Universal Time (UTC) format.

    4. Query Flag: The keyword or phrase used to filter the tweets. If no query was used, the value is NO_QUERY.

    5. User: The username of the Twitter account that posted the tweet.

    6. Text: The actual text content of the tweet.

    7. Location: The location of the tweet

  16. Indian State Elections 2022 Twitter Dataset

    • kaggle.com
    zip
    Updated Jul 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arinjay Pathak (2022). Indian State Elections 2022 Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/arinjaypathak/indian-state-elections-2022-twitter-dataset
    Explore at:
    zip(21786708 bytes)Available download formats
    Dataset updated
    Jul 19, 2022
    Authors
    Arinjay Pathak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    India
    Description

    Each csv file has tweets containing the same hashtags as the title of the file, from 1/11/2021 to 9/3/2022, i.e. up to 1 day before the actual results came out . This dataset can be used to perform the following tasks 1. Political opinion mining- Training a model to tell if it is inclined in support or against any particular politician/party 2. Result prediction- Preparing a model to predict result of election based on tweets(for actual result, we can always refer to the result that came out on March 10. 3. EDA on the data Note- The dataset contains tweets in multiple languages

  17. Twitter Friends

    • kaggle.com
    zip
    Updated Sep 2, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hubert Wassner (2016). Twitter Friends [Dataset]. https://www.kaggle.com/hwassner/TwitterFriends
    Explore at:
    zip(183520459 bytes)Available download formats
    Dataset updated
    Sep 2, 2016
    Authors
    Hubert Wassner
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Twitter Friends and hashtags

    Context

    This datasets is an extract of a wider database aimed at collecting Twitter user's friends (other accound one follows). The global goal is to study user's interest thru who they follow and connection to the hashtag they've used.

    Content

    It's a list of Twitter user's informations. In the JSON format one twitter user is stored in one object of this more that 40.000 objects list. Each object holds :

    • avatar : URL to the profile picture

    • followerCount : the number of followers of this user

    • friendsCount : the number of people following this user.

    • friendName : stores the @name (without the '@') of the user (beware this name can be changed by the user)

    • id : user ID, this number can not change (you can retrieve screen name with this service : https://tweeterid.com/)

    • friends : the list of IDs the user follows (data stored is IDs of users followed by this user)

    • lang : the language declared by the user (in this dataset there is only "en" (english))

    • lastSeen : the time stamp of the date when this user have post his last tweet.

    • tags : the hashtags (whith or without #) used by the user. It's the "trending topic" the user tweeted about.

    • tweetID : Id of the last tweet posted by this user.

    You also have the CSV format which uses the same naming convention.

    These users are selected because they tweeted on Twitter trending topics, I've selected users that have at least 100 followers and following at least 100 other account (in order to filter out spam and non-informative/empty accounts).

    Acknowledgements

    This data set is build by Hubert Wassner (me) using the Twitter public API. More data can be obtained on request (hubert.wassner AT gmail.com), at this time I've collected over 5 milions in different languages. Some more information can be found here (in french only) : http://wassner.blogspot.fr/2016/06/recuperer-des-profils-twitter-par.html

    Past Research

    No public research have been done (until now) on this dataset. I made a private application which is described here : http://wassner.blogspot.fr/2016/09/twitter-profiling.html (in French) which uses the full dataset (Millions of full profiles).

    Inspiration

    On can analyse a lot of stuff with this datasets :

    • stats about followers & followings
    • manyfold learning or unsupervised learning from friend list
    • hashtag prediction from friend list

    Contact

    Feel free to ask any question (or help request) via Twitter : @hwassner

    Enjoy! ;)

  18. Weather Tweets Dataset

    • kaggle.com
    zip
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    twtdata.com (2022). Weather Tweets Dataset [Dataset]. https://www.kaggle.com/datasets/twtdata/weather-tweets-dataset/data
    Explore at:
    zip(110097465 bytes)Available download formats
    Dataset updated
    Dec 12, 2022
    Authors
    twtdata.com
    Description

    Twitter data: Approx 525,000 Tweets (0.5m) with keyword 'weather' for 3-21 Dec 2022 including RT retweets. You can download this data and more; visit our site for more data twtdata.com Please contact mark@twtdata.com if you need more data.

  19. Israel-Palestine Conflict Tweets Dataset

    • kaggle.com
    zip
    Updated Jan 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MehyarMlaweh (2024). Israel-Palestine Conflict Tweets Dataset [Dataset]. https://www.kaggle.com/datasets/mehyarmlaweh/israel-palestine-conflict-tweets-dataset
    Explore at:
    zip(2016138 bytes)Available download formats
    Dataset updated
    Jan 1, 2024
    Authors
    MehyarMlaweh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Israel
    Description

    This dataset contains tweets related to the Israel-Palestine conflict from October 17, 2023, to December 17, 2023. It includes information on tweet IDs, links, text, date, likes, and comments, categorized into different ranges of like counts.

    Dataset Details

    • Date Range: October 17, 2023 - December 17, 2023
    • Total Tweets: 15,478
    • Unique Tweets: 14,854

    Data Description

    The dataset consists of the following columns:

    ColumnDescription
    idUnique identifier for the tweet
    linkURL link to the tweet
    textText content of the tweet
    dateDate and time when the tweet was posted
    likesNumber of likes the tweet received
    commentsNumber of comments the tweet received
    LabelLike count range category
    CountNumber of tweets in the like count range category

    How to Process the Data

    To process the dataset, you can use the following Python code. This code reads the CSV file, cleans the tweets, tokenizes and lemmatizes the text, and filters out non-English tweets.

    Required Libraries

    Make sure you have the following libraries installed:

    pip install pandas nltk langdetect
    

    Data Processing Code

    Here’s the code to process the tweets:

    import pandas as pd
    import re
    from nltk.tokenize import word_tokenize
    from nltk.corpus import stopwords
    from nltk.stem import WordNetLemmatizer
    from langdetect import detect, LangDetectException
    # Define the TweetProcessor class
    class TweetProcessor:
      def _init_(self, file_path):
        """
        Initialize the object with the path to the CSV file.
        """
        self.df = pd.read_csv(file_path)
        # Convert 'text' column to string type
        self.df['text'] = self.df['text'].astype(str)
      def clean_tweet(self, tweet):
        """
        Clean a tweet by removing links, special characters, and extra spaces.
        """
        # Remove links
        tweet = re.sub(r'https\S+', '', tweet, flags=re.MULTILINE)
        # Remove special characters and numbers
        tweet = re.sub(r'\W', ' ', tweet)
        # Replace multiple spaces with a single space
        tweet = re.sub(r'\s+', ' ', tweet)
        # Remove leading and trailing spaces
        tweet = tweet.strip()
        return tweet
      def tokenize_and_lemmatize(self, tweet):
        """
        Tokenize and lemmatize a tweet by converting to lowercase, removing stopwords, and lemmatizing.
        """
        # Tokenize the text
        tokens = word_tokenize(tweet)
        # Remove punctuation and numbers, and convert to lowercase
        tokens = [word.lower() for word in tokens if word.isalpha()]
        # Remove stopwords
        stop_words = set(stopwords.words('english'))
        tokens = [word for word in tokens if word not in stop_words]
        # Lemmatize the tokens
        lemmatizer = WordNetLemmatizer()
        tokens = [lemmatizer.lemmatize(word) for word in tokens]
        # Join tokens back into a single string
        return ' '.join(tokens)
      def process_tweets(self):
        """
        Apply cleaning and lemmatization functions to the tweets in the DataFrame.
        """
        def lang(x):
          try:
            return detect(x) == 'en'
          except LangDetectException:
            return False
        # Filter tweets for English language
        self.df = self.df[self.df['text'].apply(lang)]
        # Apply cleaning function
        self.df['cleaned_text'] = self.df['text'].apply(self.clean_tweet)
        # Apply tokenization and lemmatization function
        self.df['tokenized_and_lemmatized'] = self.df['cleaned_text'].apply(self.tokenize_and_lemmatize)
    

    Feel free to add or modify any details according to your specific requirements!

    Let me know if there’s anything else you’d like to adjust or add!

    Usage

    This dataset can be used for various research purposes, including sentiment analysis, trend analysis, and event impact studies related to the Israel-Palestine conflict. For questions or feedback, please contact:

    • Name: Mehyar Mlaweh
    • Email: mehyarmlaweh0@gmail.com
  20. Unleashing Social Sentiments: A Twitter Analysis

    • kaggle.com
    zip
    Updated Feb 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joy Shil (2023). Unleashing Social Sentiments: A Twitter Analysis [Dataset]. https://www.kaggle.com/datasets/joyshil0599/unleashing-social-sentiments-a-twitter-analysis
    Explore at:
    zip(404155 bytes)Available download formats
    Dataset updated
    Feb 27, 2023
    Authors
    Joy Shil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    "Unleashing Social Sentiments: A Twitter Analysis" appears to be a study or analysis that uses a Twitter dataset to explore the sentiment and opinions of Twitter users towards a particular topic or set of topics. Without more information about the study, it is difficult to provide a detailed analysis. However, based on the title and the use of a Twitter dataset, it is likely that the study involves the use of sentiment analysis techniques to analyze the opinions and sentiment expressed in the dataset. https://camo.githubusercontent.com/7bf6f8c804cf1ec62e2cbbc7c85ea7dfd65b4848df48be4218e24012c6eb3430/68747470733a2f2f692e6d6f72696f682e636f6d2f323032302f30322f30342f6265656633366664373037642e6a7067">

    The use of Twitter data for sentiment analysis has become increasingly popular in recent years due to the massive volume of data available and the ease with which opinions and sentiment can be expressed on the platform. By analyzing Twitter data, researchers can gain insights into public opinion and sentiment on a wide range of topics, from politics to consumer products to social issues.

    To conduct a Twitter analysis, researchers typically collect a dataset of tweets related to a particular topic or set of topics. This dataset may include features such as the Twitter username, the tweet content, the time and date of the tweet, and any associated metadata such as hashtags or mentions. The dataset can then be processed using NLP or sentiment analysis techniques to classify the sentiment expressed in each tweet as positive, negative, or neutral.

    The dataset contains tweets from the Twitter API that were scraped for seven hashtags:

    #Messi: This hashtag refers to the Argentine soccer superstar Lionel Messi, and is commonly used by fans and followers to discuss his performances, accomplishments, and news related to his career.

    #FIFAWorldCup: This hashtag is used during the FIFA World Cup, a quadrennial international soccer tournament. Tweets with this hashtag may discuss news, scores, or analysis related to the tournament.

    #DeleteFacebook: This hashtag is used by people who advocate for deleting or boycotting Facebook, often in response to controversies related to data privacy, political advertising, or other issues related to the social media giant.

    #MeToo: This hashtag is used in the context of the Me Too movement, a social movement against sexual harassment and assault, particularly in the workplace. Tweets with this hashtag may share personal stories, express support for the movement, or discuss related news and events.

    #BlackLivesMatter: This hashtag is used in the context of the Black Lives Matter movement, a movement against police brutality and systemic racism towards Black people. Tweets with this hashtag may express support for the movement, share news and updates, or discuss related issues.

    #NeverAgain: This hashtag is used in the context of the Never Again movement, which advocates for gun control and other measures to prevent school shootings and other acts of gun violence.

    #BarCamp: This hashtag refers to BarCamp, an international network of unconferences - participant-driven conferences that are open and free to attend. Tweets with this hashtag may discuss upcoming BarCamp events, share insights or learnings from past events, or express support for the BarCamp community.

    The sentiment score was generated using a pre-trained sentiment analysis model, and represents the overall sentiment of the tweet (positive, negative, or neutral).

    The data can be used to gain insights into how people are discussing and reacting to these topics on Twitter, and how the sentiment towards these hashtags may have evolved over time. Researchers and analysts can use this dataset for sentiment analysis, natural language processing, and machine learning applications.

    Some potential analyses that can be performed on the data include sentiment trend analysis over time, geographical distribution of sentiments, and topic modeling to identify themes and topics that emerge from the tweets.

    Overall, the dataset provides a rich resource for researchers and analysts interested in studying social and political issues on social media.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amin Aslami (2024). Customer Support on Twitter [Dataset]. https://www.kaggle.com/datasets/aminaslam/customer-support-on-twitter
Organization logo

Customer Support on Twitter

Explore at:
zip(78948 bytes)Available download formats
Dataset updated
Oct 17, 2024
Authors
Amin Aslami
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by Amin Aslami

Released under Apache 2.0

Contents

Search
Clear search
Close search
Google apps
Main menu