100+ datasets found
  1. Twitter Tweets Sentiment Dataset

    • kaggle.com
    zip
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
    Explore at:
    zip(1289519 bytes)Available download formats
    Dataset updated
    Apr 8, 2022
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

    Description:

    Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

    Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

    Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

    You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

    Columns:

    1. textID - unique ID for each piece of text
    2. text - the text of the tweet
    3. sentiment - the general sentiment of the tweet

    Acknowledgement:

    The dataset is download from Kaggle Competetions:
    https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict the twitter sentiments.
    • Compare the evaluation metrics of vaious classification algorithms.
  2. Twitter dataset

    • figshare.com
    csv
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.

  3. b

    Twitter Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Sep 8, 2024
    Dataset authored and provided by
    Bright Data
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.

  4. Twitter New Dataset 2024 March Data

    • kaggle.com
    zip
    Updated Mar 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Kumar Singh (2024). Twitter New Dataset 2024 March Data [Dataset]. https://www.kaggle.com/datasets/fastcurious/twitter-new-dataset-2024-march-data
    Explore at:
    zip(2923762 bytes)Available download formats
    Dataset updated
    Mar 11, 2024
    Authors
    Ayush Kumar Singh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Tweets scraped will all possible datapoints provided by twitter in each tweet. For data extraction or scraping contact me on telegram - @akaseobhw

    All datapoints present for each tweet.

    Each entry in the dataset represents a tweet along with various attributes such as the tweet's ID, URL, text content, retweet count, reply count, like count, quote count, view count, creation date, language, and more. Additionally, there are details about the tweet's author, including their username, profile URL, follower count, following count, profile picture, cover picture, description, location, creation date, and more.

    Here's a brief description of the key fields present in each tweet entry:

    • type: Indicates the type of data, in this case, it's a tweet.
    • id: Unique identifier for the tweet.
    • url: URL of the tweet.
    • twitterUrl: Twitter URL of the tweet.
    • text: Text content of the tweet.
    • retweetCount: Number of retweets.
    • replyCount: Number of replies.
    • likeCount: Number of likes (favorites).
    • quoteCount: Number of times the tweet has been quoted.
    • viewCount: Number of views.
    • createdAt: Date and time when the tweet was created.
    • lang: Language of the tweet.
    • quoteId: ID of the quoted tweet, if this tweet is a quote.
    • bookmarkCount: Number of times the tweet has been bookmarked.
    • isReply: Indicates whether the tweet is a reply to another tweet.
    • author: Information about the author of the tweet.
      • userName: Username of the author.
      • url: URL of the author's profile.
      • followers: Number of followers of the author.
      • following: Number of accounts the author is following.
      • profilePicture: URL of the author's profile picture.
      • coverPicture: URL of the author's cover picture.
      • description: Description or bio of the author.
      • location: Location of the author.
      • createdAt: Date and time when the author's account was created.
    • entities: Entities present in the tweet, such as hashtags, symbols, URLs, and user mentions.
    • isRetweet: Indicates whether the tweet is a retweet.
    • isQuote: Indicates whether the tweet is a quote.
    • quote: Information about the quoted tweet, if this tweet is a quote.
    • media: Information about any media (such as images or videos) attached to the tweet.

    This dataset can be analyzed to gain insights into trends, sentiments, and user behavior on Twitter. You can use Python libraries like pandas to load this dataset and perform various analyses and visualizations.

  5. b

    Tweets Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Tweets Dataset [Dataset]. https://brightdata.com/products/datasets/twitter/tweets
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Nov 13, 2024
    Dataset authored and provided by
    Bright Data
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Tweets dataset for a range of applications to enhance business strategies and market insights. Analyzing this dataset offers a comprehensive view of social media dynamics, empowering organizations to optimize their communication and marketing strategies. Access the full dataset or select specific data points tailored to your needs. Popular use cases include sentiment analysis to gauge public opinion and brand perception, competitor analysis by examining engagement and sentiment around rival brands, and crisis management through real-time tracking of tweet sentiment and influential voices during critical events.

  6. Twitter dataset

    • figshare.com
    txt
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mehdi khalil (2024). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28069163.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    mehdi khalil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Truth Seeker Dataset is designed to support research in the detection and classification of misinformation on social media platforms, particularly focusing on Twitter. This dataset is part of a broader initiative to enhance the understanding of how machine learning (ML) and natural language processing (NLP) can be leveraged to identify fake news and misleading content in real-time.Dataset CompositionThe Truth Seeker Dataset comprises a substantial collection of social media posts that have been meticulously labeled as either real or fake. It was constructed using advanced ML algorithms and NLP techniques to analyze the language patterns in social media communications. The dataset includes:Raw Social Media Posts: A diverse range of tweets that reflect various topics and sentiments.Labeling: Each post is annotated with binary labels indicating its authenticity (real or fake).Feature Sets: Two distinct subsets of the dataset have been created using different NLP vectorization methods—Word2Vec and TF-IDF. This allows researchers to explore how different feature representations impact model performance.Research ApplicationsThe primary aim of the Truth Seeker Dataset is to facilitate the development and validation of models that can accurately classify social media content. Key applications include:Fake News Detection: Utilizing various ML algorithms, including Random Forest and AdBoost, which have demonstrated high F1 scores in preliminary evaluations.Model Comparison: Researchers can compare the effectiveness of different ML approaches on the same dataset, enabling a clearer understanding of which methods yield the best results in detecting misinformation.Algorithm Development: The dataset serves as a benchmark for developing new algorithms aimed at improving accuracy in fake news detection.

  7. Tweets and User Engagement

    • kaggle.com
    zip
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Tweets and User Engagement [Dataset]. https://www.kaggle.com/datasets/thedevastator/tweets-and-user-engagement
    Explore at:
    zip(9121838 bytes)Available download formats
    Dataset updated
    Dec 6, 2023
    Authors
    The Devastator
    Description

    Tweets and User Engagement

    Twitter Data: Tweet Characteristics and Engagement Metrics

    By Krystal Jensen [source]

    About this dataset

    The dataset Twitter Data: Tweets and User Interactions provides comprehensive information about tweets and user interactions on the popular social media platform Twitter. The dataset includes various attributes that shed light on the characteristics and engagement metrics of tweets, allowing for in-depth analysis of user behavior and content performance.

    One of the key variables in this dataset is the Klout score, which represents the influence and reputation of the Twitter users who posted the tweets. This numeric metric helps assess the impact a user has on their audience and provides insights into their social media presence.

    Another essential attribute is the text content of each tweet. By examining this textual data, analysts can uncover valuable information about trending topics, opinions, sentiments, conversations, or news shared by users. It serves as a primary source for understanding what people share publicly on Twitter.

    The dataset Twitter+data+in+sheets.csv serves as a reliable resource for conducting research or performing analytics that require detailed information about Twitter activity. It covers aspects such as tweet characteristics (including length and language), engagement metrics (such as retweets and favorites), sentiment analysis (revealing positive or negative emotions expressed), as well as individual user details.

    By utilizing this extensive dataset, researchers can gain valuable insights into patterns of online communication within Twitter's vast network. They can identify influential individuals with high Klout scores who have substantial reach among their followers or communities. Additionally, they can analyze various aspects related to tweet content such as sentiment analysis to understand public opinion trends or measure engagement levels through counts like retweets and favorites.

    Overall, this dataset serves as an invaluable resource for anyone interested in comprehensively analyzing tweets' characteristics, exploring how users interact with them across different dimensions like popularity or sentiment analysis groups—or examining correlations between Klout scores with other factors influencing engagement levels like time posted

    How to use the dataset

    Welcome to the Twitter Data: Tweets and User Interactions dataset! This dataset provides valuable insights into tweet characteristics and user engagement on Twitter. Here is a useful guide on how to make the most out of this dataset:

    • Understanding the Columns: There are two main columns in this dataset:

      • Klout Score (Numeric): The Klout score indicates the influence of the user who posted the tweet. A higher Klout score suggests greater influence and reach.
      • Text Content of Tweet (Text): This column contains the actual text content of each tweet.
    • Analyzing Tweet Characteristics: The text content column will help you understand various aspects of tweets, such as language, sentiment, trending topics, or specific keywords used by users. You can perform text analysis techniques like word frequency analysis or sentiment analysis to gain insights into tweet characteristics.

    • Examining User Engagement: The Klout score provides a measure of user influence on Twitter. By analyzing this column, you can identify highly influential users who generate higher engagement rates with their tweets. You can further explore interactions (likes, retweets, replies) between these influential users and other Twitter users mentioned in their tweets.

    • Identifying Trends and Patterns: With this dataset's rich information about tweet content and user engagement, you can identify popular trends or patterns among highly engaged tweets or influential users over different time periods.

    Remember that dates are not included in this guide since they were not provided in the original request for creating it.

    Please note that it is essential to responsibly use this data for any analysis or research purposes while adhering to ethical considerations related to privacy rights and data usage policies set by both Kaggle platform rules as well as any relevant privacy regulations.

    Best regards, [Your Name]

    Research Ideas

    • Analyzing the relationship between Klout score and the content of tweets: This dataset can be used to investigate whether there is a correlation between a user's Klout score (a measure of their social media influence) and the characteristics of their tweets. By examining factors such as tweet length, sentiment, and engagement metrics, researchers can gain...
  8. The Climate Change Twitter Dataset

    • kaggle.com
    • data.mendeley.com
    zip
    Updated May 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/deffro/the-climate-change-twitter-dataset
    Explore at:
    zip(428878019 bytes)Available download formats
    Dataset updated
    May 26, 2022
    Authors
    Dimitrios Effrosynidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use the dataset, cite the papers: https://doi.org/10.1016/j.eswa.2022.117541 and https://doi.org/10.1371/journal.pone.0274213

    The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

    The following columns are in the dataset:

    ➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

    Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.

  9. i

    Coronavirus (COVID-19) Tweets Dataset

    • ieee-dataport.org
    • search.datacite.org
    • +1more
    Updated May 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabindra Lamsal (2025). Coronavirus (COVID-19) Tweets Dataset [Dataset]. https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset
    Explore at:
    Dataset updated
    May 7, 2025
    Authors
    Rabindra Lamsal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    2020

  10. B

    COVID-19 Twitter Dataset

    • borealisdata.ca
    • search.dataone.org
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anatoliy Gruzd; Philip Mai (2020). COVID-19 Twitter Dataset [Dataset]. http://doi.org/10.5683/SP2/PXF2CU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Borealis
    Authors
    Anatoliy Gruzd; Philip Mai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The current dataset contains 237M Tweet IDs for Twitter posts that mentioned "COVID" as a keyword or as part of a hashtag (e.g., COVID-19, COVID19) between March and July of 2020. Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms. NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs

  11. Twitter Dataset

    • kaggle.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudish Basnet (2023). Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/sudishbasnet/twitter-dataset
    Explore at:
    zip(5384506 bytes)Available download formats
    Dataset updated
    Jul 18, 2023
    Authors
    Sudish Basnet
    Description

    This raw dataset consists of 30,000 plus data from the year 2022 with 12 attributes. This includes Unnamed: 0, ID, User Name, User Location, User Description, User Verified, Date. Tweet, Length, Likes, Re-Tweets, and Source.

  12. h

    tweet_eval

    • huggingface.co
    Updated Oct 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cardiff NLP (2020). tweet_eval [Dataset]. https://huggingface.co/datasets/cardiffnlp/tweet_eval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 23, 2020
    Dataset authored and provided by
    Cardiff NLP
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for tweet_eval

      Dataset Summary
    

    TweetEval consists of seven heterogenous tasks in Twitter, all framed as multi-class tweet classification. The tasks include - irony, hate, offensive, stance, emoji, emotion, and sentiment. All tasks have been unified into the same benchmark, with each dataset presented in the same format and with fixed training, validation and test splits.

      Supported Tasks and Leaderboards
    

    text_classification: The dataset can be… See the full description on the dataset page: https://huggingface.co/datasets/cardiffnlp/tweet_eval.

  13. b

    Twitter Posts Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Twitter Posts Dataset [Dataset]. https://brightdata.com/products/datasets/twitter/posts
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Sep 8, 2024
    Dataset authored and provided by
    Bright Data
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Use our Twitter Posts dataset to analyze detailed information about tweets, including post content, author username, hashtags, mentions, likes, retweets, replies, and posting date. Popular use cases include sentiment analysis, tracking public opinion, and evaluating the performance of social media campaigns. Over 31M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

    Post ID Username & Name Description/Content Posting Date Photos & Videos Quoted Posts Tagged Users Replies Count Reposts Count Likes Count Views Count External URLs Hashtags Followers Count Biography Profile Image Verification Status Bookmarks And much more

  14. g

    Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

    • search.gesis.org
    • datacatalogue.cessda.eu
    Updated Oct 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pfeffer, Jürgen (2022). Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2516
    Explore at:
    Dataset updated
    Oct 16, 2022
    Dataset provided by
    GESIS search
    GESIS, Köln
    Authors
    Pfeffer, Jürgen
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Description

    At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.

  15. Z

    #IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azmi Nawwar (2024). #IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4362504
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    UIN Syarif Hidayatullah Jakarta
    Authors
    Azmi Nawwar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset ini merupakan hasil dari scraping pada media sosial twitter dengan menggunakan aplikasi twint yang ditujukan pada hashtag #IndonesiaHumanRightsSOS. Scraping data dilakukan untuk cuitan yang dibuat dari tanggal 18 Desember 2020 10:59 AM s/d 19 Desember 2020 23:18 PM.

    Pada dataset mengandung 106.903 Row data dengan informasi terkait: User ID, Username, Twitter Name,Tweets, dsb.

    Selain itu dilampirkan juga contoh data yang telah dianalisis berupa wordcloud,username cloud, 100 most used word & most active username.

    -

    This dataset is the result of scraping on social media twitter using the twint application aimed at the hashtag #IndonesiaHumanRightsSOS. Data scraping is done for tweets made from December 18 2020 10:59 AM to December 19 2020 23:18 PM.

    The dataset contains 106,903 rows of data with related information: User ID, Username, Twitter Name, Tweets, etc.

    Also there is an example of the data that has been analyzed in the form of wordcloud, username cloud, 100 most used words & most active username.

  16. h

    twitter-financial-news-topic

    • huggingface.co
    Updated Dec 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    not a (2022). twitter-financial-news-topic [Dataset]. https://huggingface.co/datasets/zeroshot/twitter-financial-news-topic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 4, 2022
    Authors
    not a
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.

    The dataset holds 21,107 documents annotated with 20 labels:

    topics = { "LABEL_0": "Analyst Update", "LABEL_1": "Fed | Central Banks", "LABEL_2": "Company | Product News", "LABEL_3": "Treasuries | Corporate Debt", "LABEL_4": "Dividend"… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-topic.

  17. twitter-dataset-tesla

    • huggingface.co
    Updated Jul 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fastai X Hugging Face Group 2022 (2022). twitter-dataset-tesla [Dataset]. https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2022
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    fastai X Hugging Face Group 2022
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for Twitter Dataset: Tesla

      Dataset Summary
    

    This dataset contains all the Tweets regarding #Tesla or #tesla till 12/07/2022 (dd-mm-yyyy). It can be used for sentiment analysis research purpose or used in other NLP tasks or just for fun. It contains 10,000 recent Tweets with the user ID, the hashtags used in the Tweets, and other important features.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla.

  18. i

    Twitter Dataset for Mental Disorders Detection

    • ieee-dataport.org
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miryam Villa (2024). Twitter Dataset for Mental Disorders Detection [Dataset]. https://ieee-dataport.org/documents/twitter-dataset-mental-disorders-detection
    Explore at:
    Dataset updated
    Nov 15, 2024
    Authors
    Miryam Villa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OCD

  19. Twitter Sentiment Analysis Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Feb 23, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2026). Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Feb 23, 2026
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

    Key Features:
    
      Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
      Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
      Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
      Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
      Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
      Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
    
    
    Use Cases:
    
      Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
      Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
      Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
      AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
      Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
    
    
    
      Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
      Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
    
  20. m

    Dataset for twitter Sentiment Analysis using Roberta and Vader

    • data.mendeley.com
    Updated May 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jannatul Ferdoshi Jannatul Ferdoshi (2023). Dataset for twitter Sentiment Analysis using Roberta and Vader [Dataset]. http://doi.org/10.17632/2sjt22sb55.1
    Explore at:
    Dataset updated
    May 14, 2023
    Authors
    Jannatul Ferdoshi Jannatul Ferdoshi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.

    This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.

    In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
Organization logo

Twitter Tweets Sentiment Dataset

Twitter Tweets Sentiment Analysis for Natural Language Processing

Explore at:
43 scholarly articles cite this dataset (View in Google Scholar)
zip(1289519 bytes)Available download formats
Dataset updated
Apr 8, 2022
Authors
M Yasser H
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

Description:

Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

Columns:

  1. textID - unique ID for each piece of text
  2. text - the text of the tweet
  3. sentiment - the general sentiment of the tweet

Acknowledgement:

The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

Objective:

  • Understand the Dataset & cleanup (if required).
  • Build classification models to predict the twitter sentiments.
  • Compare the evaluation metrics of vaious classification algorithms.
Search
Clear search
Close search
Google apps
Main menu