86 datasets found
  1. Data from: Social Media Engagement Dataset

    • kaggle.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subash Shanmugam (2025). Social Media Engagement Dataset [Dataset]. https://www.kaggle.com/datasets/subashmaster0411/social-media-engagement-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Subash Shanmugam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This machine-generated dataset simulates social media engagement data across various metrics, including likes, shares, comments, impressions, sentiment scores, toxicity, and engagement growth. It is designed for analysis and visualization of trends, buzz frequency, public sentiment, and user behavior on digital platforms.

    The dataset can be used to:

    Identify spikes or drops in engagement

    Analyze changes in sentiment over time

    Build dashboards for digital trend tracking

    Test algorithms for sentiment analysis or trend prediction

  2. Twitter Tweets Sentiment Dataset

    • kaggle.com
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

    Description:

    Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

    Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

    Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

    You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

    Columns:

    1. textID - unique ID for each piece of text
    2. text - the text of the tweet
    3. sentiment - the general sentiment of the tweet

    Acknowledgement:

    The dataset is download from Kaggle Competetions:
    https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict the twitter sentiments.
    • Compare the evaluation metrics of vaious classification algorithms.
  3. Social Media Posts in Arabic Dialect

    • kaggle.com
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UM6P Open Data (2024). Social Media Posts in Arabic Dialect [Dataset]. https://www.kaggle.com/datasets/um6popendata/sentiment-analysis-for-sm-posts-in-arabic-dialect
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Kaggle
    Authors
    UM6P Open Data
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset: Sentiment Analysis for Social Media Posts in Arabic Dialect

    Overview

    This dataset contains a labeled collection of approximately 50,000 social media posts in various Arabic dialects. Each post has been manually annotated with sentiment labels, providing a rich resource for natural language processing and sentiment analysis research.

    Dataset Owner

    UM6P College of Computing

    Content

    • Posts: The dataset includes raw text data of social media posts written in different Arabic dialects.
    • Sentiment Labels: Each post is labeled with one of the following sentiment categories:
      • Positive
      • Negative
      • Neutral

    Features

    • Post ID: A unique identifier for each social media post.
    • Text: The content of the social media post in Arabic.
    • Sentiment: The sentiment label assigned to the post (Positive, Negative, Neutral).

    Format

    The dataset is provided in a CSV format with the following columns: - Post_ID: Integer - Text: String - Sentiment: String (Positive, Negative, Neutral)

    Usage

    This dataset is ideal for tasks such as: - Training sentiment analysis models - Studying sentiment trends in Arabic social media - Exploring the linguistic characteristics of Arabic dialects - Benchmarking sentiment analysis tools

    Example Data

    Post_IDTextSentiment
    1"هذا المنتج رائع جدًا وأحببته كثيرًا"Positive
    2"لم يعجبني هذا الفيلم، كان مملًا جدًا"Negative
    3"الطقس اليوم عادي، لا يوجد شيء مميز"Neutral

    Licensing

    Please refer to the dataset license included in the dataset files for information on usage rights and restrictions.

    Citation

    An open access NLP dataset for Arabic dialects: data collection, labeling, and model construction, Elmehdi Boujou, Hamza Chataoui, Abdellah El Mekki, Saad Benjelloun, Ikram Chairi and Ismail Berrada MENACIS 2020 conference, In press.

  4. Social Media Sentiment Analysis

    • kaggle.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bisma Sajjad (2024). Social Media Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/bismasajjad/social-media-sentiment-analysis/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bisma Sajjad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A dataset of social media posts (tweets, Facebook posts, etc.) along with sentiment scores (positive, negative, neutral). The data covers a variety of topics such as politics, entertainment, and health. Columns: Post ID, Date, Platform, Topic (e.g., Politics, Entertainment), Sentiment Score (1 = Positive, -1 = Negative, 0 = Neutral), Text Content.

  5. h

    myanmar-social-media-sentiment-analysis-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chuu Htet Naing, myanmar-social-media-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/chuuhtetnaing/myanmar-social-media-sentiment-analysis-dataset
    Explore at:
    Authors
    Chuu Htet Naing
    Area covered
    မြန်မာ
    Description

    Myanmar Social Media Sentiment Analysis Dataset

    A Myanmar language dataset for sentiment analysis of social media content, translated from an English source dataset.

      Dataset Description
    

    This dataset contains social media text with sentiment annotations translated into Myanmar language. It is derived from the original Social Media Sentiments Analysis Dataset on Kaggle, with texts professionally translated to Myanmar language while preserving the sentiment labels.… See the full description on the dataset page: https://huggingface.co/datasets/chuuhtetnaing/myanmar-social-media-sentiment-analysis-dataset.

  6. SMILE Twitter Emotion Dataset

    • kaggle.com
    • figshare.com
    Updated Jul 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashwani Rathee (2020). SMILE Twitter Emotion Dataset [Dataset]. https://www.kaggle.com/datasets/ashkhagan/smile-twitter-emotion-dataset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ashwani Rathee
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    SMILE Twitter Emotion dataset

    This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums.

    It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.

    License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.

  7. SENTIMENT ANALYSIS OF SOCIAL MEDIA PLATFORMS

    • kaggle.com
    Updated Sep 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jigyashu Singh Lodhi (2023). SENTIMENT ANALYSIS OF SOCIAL MEDIA PLATFORMS [Dataset]. http://doi.org/10.34740/kaggle/dsv/6473513
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 14, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jigyashu Singh Lodhi
    Description

    Dataset

    This dataset was created by Jigyashu Singh Lodhi

    Released under Other (specified in description)

    Contents

  8. f

    Twitter dataset

    • figshare.com
    csv
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    figshare
    Authors
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.

  9. E

    A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

    • live.european-language-grid.eu
    • zenodo.org
    • +1more
    tsv
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). A Sentiment Analysis Dataset for Code-Mixed Malayalam-English [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7634
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Dec 13, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.

  10. Tweet Sentiment's Impact on Stock Returns

    • kaggle.com
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Tweet Sentiment's Impact on Stock Returns [Dataset]. https://www.kaggle.com/datasets/thedevastator/tweet-sentiment-s-impact-on-stock-returns
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Tweet Sentiment's Impact on Stock Returns

    862,231 Labeled Instances

    By [source]

    About this dataset

    This dataset contains 862,231 labeled tweets and associated stock returns, providing a comprehensive look into the impact of social media on company-level stock market performance. For each tweet, researchers have extracted data such as the date of the tweet and its associated stock symbol, along with metrics such as last price and various returns (1-day return, 2-day return, 3-day return, 7-day return). Also recorded are volatility scores for both 10 day intervals and 30 day intervals. Finally, sentiment scores from both Long Short - Term Memory (LSTM) and TextBlob models have been included to quantify the overall tone in which these messages were delivered. With this dataset you will be able to explore how tweets can affect a company's share prices both short term and long term by leveraging all of these data points for analysis!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use this dataset, users can utilize descriptive statistics such as histograms or regression techniques to establish relationships between tweet content & sentiment with corresponding stock return data points such as 1-day & 7-day returns measurements.

    The primary fields used for analysis include Tweet Text (TWEET), Stock symbol (STOCK), Date (DATE), Closing Price at the time of Tweet (LAST_PRICE) a range of Volatility measures 10 day Volatility(VOLATILITY_10D)and 30 day Volatility(VOLATILITY_30D ) for each Stock which capture changes in market fluctuation during different periods around when Twitter reactions occur. Additionally Sentiment Polarity analysis undertaken via two Machine learning algorithms LSTM Polarity(LSTM_POLARITY)and Textblob polarity provide insight into whether people are expressing positive or negative sentiments about each company at given times which again could influence thereby potentially influence Stock Prices over shorter term periods like 1-Day Returns(1_DAY_RETURN),2-Day Returns(2_DAY_RETURN)or longer term horizon like 7 Day Returns*7DAY RETURNS*.Finally MENTION field indicates if names/acronyms associated with Companies were specifically mentioned in each Tweet or not which gives extra insight into whether company specific contexts were present within individual Tweets aka “Company Relevancy”

    Research Ideas

    • Analyzing the degree to which tweets can influence stock prices. By analyzing relationships between variables such as tweet sentiment and stock returns, correlations can be identified that could be used to inform investment decisions.
    • Exploring natural language processing (NLP) models for predicting future market trends based on textual data such as tweets. Through testing and evaluating different text-based models using this dataset, better predictive models may emerge that can give investors advance warning of upcoming market shifts due to news or other events.
    • Investigating the impact of different types of tweets (positive/negative, factual/opinionated) on stock prices over specific time frames. By studying correlations between the sentiment or nature of a tweet and its effect on stocks, insights may be gained into what sort of news or events have a greater impact on markets in general

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: reduced_dataset-release.csv | Column name | Description | |:----------------------|:-------------------------------------------------------------------------------------------------------| | TWEET | Text of the tweet. (String) | | STOCK | Company's stock mentioned in the tweet. (String) | | DATE | Date the tweet was posted. (Date) | | LAST_PRICE | Company's last price at the time of tweeting. (Float) ...

  11. m

    The Climate Change Twitter Dataset

    • data.mendeley.com
    • kaggle.com
    Updated May 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. http://doi.org/10.17632/mw8yd7z9wc.2
    Explore at:
    Dataset updated
    May 19, 2022
    Authors
    Dimitrios Effrosynidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541

    The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

    The following columns are in the dataset:

    ➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

    Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.

  12. f

    S1 File -

    • plos.figshare.com
    csv
    Updated Feb 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Akib Jawad Karim; Kazi Hafiz Md. Asad; Md. Golam Rabiul Alam (2025). S1 File - [Dataset]. http://doi.org/10.1371/journal.pone.0315829.s001
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Akib Jawad Karim; Kazi Hafiz Md. Asad; Md. Golam Rabiul Alam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT-based model for natural language processing (NLP) applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task—classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million-resulting in a model approximately 73.64% smaller. On the General Language Understanding Evaluation (GLUE) benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy of 85%, F1 score of 85%, precision of 85%, and recall of 85%. When compared to DistilBERT (66 million parameters) and ClinicalBERT (110 million parameters), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model’s capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab and Kaggle Notebooks. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.

  13. Vietnamese Social Media Emotion Corpus

    • kaggle.com
    Updated Dec 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minh Thanh (2022). Vietnamese Social Media Emotion Corpus [Dataset]. https://www.kaggle.com/datasets/hmthanh/vsmec
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 29, 2022
    Dataset provided by
    Kaggle
    Authors
    Minh Thanh
    Area covered
    Vietnam
    Description

    Emotion recognition is a higher approach or special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of sentiment analysis in which the result are depicted in more expressions like sadness, enjoyment, anger, disgust, fear and surprise. Emotion recognition plays a critical role in measuring brand value of a product by recognizing specific emotions of customers’ comments. In this study, we have achieved two targets. First and foremost, we built a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese which is a low-resource language in Natural Language Processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC. As a result, Convolutional Neural Network (CNN) model achieved the highest performance with 57.61% of F1-score.

    Paper: Vong Ho, Duong Nguyen, Danh Nguyen, Linh Pham, Kiet Nguyen and Ngan Nguyen, Emotion Recognition for Vietnamese Social Media Text, 2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), October 11-13, 2019, Ha Noi, Vietnam. Link.

    https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects

  14. A

    ‘Flat Earth on Twitter’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Flat Earth on Twitter’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-flat-earth-on-twitter-6f4f/f73edd7f/?iid=004-389&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Earth
    Description

    Analysis of ‘Flat Earth on Twitter’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/gpreda/flat-earth-on-twitter on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    One of the most successful conspiracy theory is the Flat Earth theory. It has millions of followers, from Inspired Scientists that created the fake science behind the theory, Flat Brain influencers that evangelize this new religion, True Believers that are just there to press the like button. The Flat Earth theory is everywhere around Social Media. Here we collect tweets about this conspiracy theory.

    Collection

    Tweets using #FlatEarth hashtag are collected.

    Collected using tweepy.

    The data is not filtered.

    Inspiration

    Use the texts in this dataset to:

    • train to do text data analysis;
    • perform sentiment analysis;
    • perform topic modelling on the text corpus.

    --- Original source retains full ownership of the source dataset ---

  15. f

    Navigating News Narratives: A Media Bias Analysis Dataset

    • figshare.com
    txt
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    figshare
    Authors
    Shaina Raza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0

  16. i

    Analytics

    • ieee-dataport.org
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuriy Syerov (2025). Analytics [Dataset]. https://ieee-dataport.org/documents/social-media-big-dataset-research-analytics-prediction-and-understanding-global-climate
    Explore at:
    Dataset updated
    Jun 17, 2025
    Authors
    Yuriy Syerov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    trends

  17. Twitter Sentiment Analysis - 1M data

    • kaggle.com
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhossein Ahmadnejad (2023). Twitter Sentiment Analysis - 1M data [Dataset]. https://www.kaggle.com/datasets/amirhoseinahmadnejad/twitter-sentiment-analysis-1m-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amirhossein Ahmadnejad
    Description

    this dataset is a combination of over 6 different datasets found on Kaggle. the labels are 0 and 1 which means negative and positive tweets. in the cleared dataset I delete mentions. you can do any preprocessing you want on the dataset. I will appreciate any notebooks submitted on this dataset to help others with sentiment analysis tasks. I will submit mine as well.

  18. sentiment analysis

    • kaggle.com
    Updated Sep 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sambita (2020). sentiment analysis [Dataset]. https://www.kaggle.com/datasets/samch08/sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sambita
    Description

    Dataset

    This dataset was created by Sambita

    Contents

  19. f

    Weighted average comparison of LastBERT, DistilBERT, and ClinicalBERT on...

    • plos.figshare.com
    xls
    Updated Feb 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Akib Jawad Karim; Kazi Hafiz Md. Asad; Md. Golam Rabiul Alam (2025). Weighted average comparison of LastBERT, DistilBERT, and ClinicalBERT on ADHD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0315829.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Akib Jawad Karim; Kazi Hafiz Md. Asad; Md. Golam Rabiul Alam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Weighted average comparison of LastBERT, DistilBERT, and ClinicalBERT on ADHD dataset.

  20. Raw twitter data for sentiment analysis

    • kaggle.com
    Updated Jul 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swagata Datta (2020). Raw twitter data for sentiment analysis [Dataset]. https://www.kaggle.com/swagatadatta/twitter-raw-tweets-data/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Swagata Datta
    Description

    Dataset

    This dataset was created by Swagata Datta

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Subash Shanmugam (2025). Social Media Engagement Dataset [Dataset]. https://www.kaggle.com/datasets/subashmaster0411/social-media-engagement-dataset
Organization logo

Data from: Social Media Engagement Dataset

Synthetic social media activity data for trend and sentiment analysis"

Related Article
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Subash Shanmugam
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This machine-generated dataset simulates social media engagement data across various metrics, including likes, shares, comments, impressions, sentiment scores, toxicity, and engagement growth. It is designed for analysis and visualization of trends, buzz frequency, public sentiment, and user behavior on digital platforms.

The dataset can be used to:

Identify spikes or drops in engagement

Analyze changes in sentiment over time

Build dashboards for digital trend tracking

Test algorithms for sentiment analysis or trend prediction

Search
Clear search
Close search
Google apps
Main menu