65 datasets found
  1. a

    Social Media Analyzing.ova

    • academictorrents.com
    bittorrent
    Updated Mar 11, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chanveer Singh (2017). Social Media Analyzing.ova [Dataset]. https://academictorrents.com/details/5c7d429c9991bf87fea35feef68889eada4a3425
    Explore at:
    bittorrent(15408308736)Available download formats
    Dataset updated
    Mar 11, 2017
    Dataset authored and provided by
    Chanveer Singh
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    This is a project on Social Media Sentiment Analysis using Hortonworks Sandbox following the procedure provided at website. The default username and password is root and clickstream respectively. Any BI tool can be used but I recommend Tableau which can be downloaded from website. Any user can contact me at cmdude16@gmail.com for further guidance.

  2. Vietnamese Social Media Emotion Corpus

    • kaggle.com
    Updated Dec 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minh Thanh (2022). Vietnamese Social Media Emotion Corpus [Dataset]. https://www.kaggle.com/datasets/hmthanh/vsmec
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 29, 2022
    Dataset provided by
    Kaggle
    Authors
    Minh Thanh
    Area covered
    Vietnam
    Description

    Emotion recognition is a higher approach or special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of sentiment analysis in which the result are depicted in more expressions like sadness, enjoyment, anger, disgust, fear and surprise. Emotion recognition plays a critical role in measuring brand value of a product by recognizing specific emotions of customers’ comments. In this study, we have achieved two targets. First and foremost, we built a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese which is a low-resource language in Natural Language Processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC. As a result, Convolutional Neural Network (CNN) model achieved the highest performance with 57.61% of F1-score.

    Paper: Vong Ho, Duong Nguyen, Danh Nguyen, Linh Pham, Kiet Nguyen and Ngan Nguyen, Emotion Recognition for Vietnamese Social Media Text, 2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), October 11-13, 2019, Ha Noi, Vietnam. Link.

    https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects

  3. SMILE Twitter Emotion dataset

    • figshare.com
    txt
    Updated Apr 21, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen (2016). SMILE Twitter Emotion dataset [Dataset]. http://doi.org/10.6084/m9.figshare.3187909.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 21, 2016
    Dataset provided by
    figshare
    Authors
    Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.

  4. TRACES Sentiment Analysis Twitter Dataset

    • zenodo.org
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irina Temnikova; Irina Temnikova; Silvia Gargova; Silvia Gargova (2023). TRACES Sentiment Analysis Twitter Dataset [Dataset]. http://doi.org/10.5281/zenodo.7357386
    Explore at:
    Dataset updated
    Oct 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Irina Temnikova; Irina Temnikova; Silvia Gargova; Silvia Gargova
    Description

    This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 1810 unique tweet IDs, written in Bulgarian, with annotations (positive, negative, neutral). The tweets are on the topics of lies, manipulation, and Covid-19 and are a subset of the following datasets:

    https://zenodo.org/record/7296865

    https://zenodo.org/record/7296736

    https://zenodo.org/record/7296877

    The tweets have been collected via Twitter API under academic access between 1 Jan 2020 - 28 June 2022 and thus cannot be used for commercial purposes.

  5. The dUCk Tweets

    • kaggle.com
    Updated Aug 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlson (2020). The dUCk Tweets [Dataset]. https://www.kaggle.com/carlsonhoo/the-duck-tweets/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carlson
    Description

    Objective

    The datasets were downloaded from Twitter by using getOldTweets3 in order to analyze the public sentiment toward the brand. The tweets started from Jan 2019 until end of June 2020. The tweets were downloaded by using 2 keywords, "Vivy duck", "Vivy" is refer to the Brand Owner Vivy Yusof and "duck" is refer to the brand name The dUCk group. The original tweets are mixed with English and Malay languages.

    Brand

    Founded by popular blogger cum entrepreneur Vivy Yusof, dUCk launched in May 2014, and was born out of the love for well-branded scarves, aiming to convey the message that wearing scarves should be a celebrated act among women. The dUCk brand which revolves around a character named D, rose quickly in its popularity across the world, and has since expanded to become The dUCk Group. The dUCk Group today comprises of 5 main product lines – Scarves, Cosmetics, Stationeries, Bags, and Home & Living.

    Since MCO is implied due to Covid-19, the brand received quite a backlash on Twitter and reached its peak in April 2020. Thus, it is interesting to find out public sentiment on Twitter toward the owner “Vivy” and the brand, “dUCk” to get an insight of the image and how it affected the brand.

    Acknowledgements

    The study is only for academic purposes, to understand how the phenomena on social media can change the public sentiment toward the brand. Photo by ONNE Beauty

    Inspiration

    The reason why the brand was picked because we're interested to see how the sentiment changed especially there were 2 incident happened to the brand in Jan 2020 and April 2020

    Files

    • raw_tweets_012019_to_062020.csv: Complete Raw data with mixture of English and Malay Tweets
    • tweets_012019_to_062020_translated.csv: Complete set of tweets that translated to English Only by Google Translate
    • training.csv: Original Tweets (Without Translation to English) that manually labelled the sentiment polarity
  6. Shein Tweets (Original and English Only + Sentiment Scores)

    • figshare.com
    txt
    Updated Mar 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sage Luong (2023). Shein Tweets (Original and English Only + Sentiment Scores) [Dataset]. http://doi.org/10.6084/m9.figshare.22273084.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    figshare
    Authors
    Sage Luong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our cleaned dataset with id strings of tweets containing "shein" with only original, English tweets plus sentiment scores for our Winter 2023 Digital Humanities 120: Social Media Data Analytics project at UCLA.

  7. DeepSeek

    • figshare.com
    json
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zeqin lin (2025). DeepSeek [Dataset]. http://doi.org/10.6084/m9.figshare.29377388.v2
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 22, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    zeqin lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project contains data, analysis, and insights derived from discussions about DeepSeek technology on Weibo. The study aims to understand public sentiment and key discussion topics related to DeepSeek technology using Natural Language Processing (NLP) techniques such as topic modeling and sentiment analysis.

  8. c

    Digital Phenotyping via Social Media Content 2

    • datacatalogue.cessda.eu
    • ssh.datastations.nl
    Updated Feb 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SK Kavvadias (2024). Digital Phenotyping via Social Media Content 2 [Dataset]. http://doi.org/10.17026/dans-z7g-9wek
    Explore at:
    Dataset updated
    Feb 10, 2024
    Dataset provided by
    RMIT University: Melbourne, Victoria, AU
    Authors
    SK Kavvadias
    Description

    The research project associated with this dataset focuses on the analysis of the top threads within the ddo subreddit. The dataset contains essential information about each of these threads, including the author's username, the post's title, the post text, its score, and the number of comments it has received. Additionally, it includes a detailed record of all comments within each thread, encompassing the commenter's username, the date and time of their comment, and the score received by each comment.
    The purpose of this project is to recognize addicted users within the ddo subreddit community by considering their activity patterns, emotional expressions, and content preferences, ultimately contributing to a deeper understanding of addiction-related behaviors in online communities and informing strategies for tailored support and interventions.


    Date Submitted: 2023-09-19

  9. O

    Online News Tracking Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Online News Tracking Report [Dataset]. https://www.archivemarketresearch.com/reports/online-news-tracking-562601
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jul 20, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The online news tracking market is experiencing robust growth, driven by the increasing demand for real-time information and the proliferation of digital news sources. Our analysis projects a market size of $15 billion in 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This significant expansion is fueled by several key factors. The rise of social media and its impact on news dissemination necessitates efficient tracking solutions. Furthermore, the need for brand monitoring, sentiment analysis, and competitive intelligence analysis within the rapidly evolving digital landscape is driving adoption. Government agencies and media organizations are also major contributors to market growth, as they rely on real-time news monitoring for crisis management, public safety, and strategic decision-making. The market is segmented by software type (cloud-based vs. on-premise), deployment mode (web-based vs. mobile), organization size (SMEs vs. large enterprises), and end-use industry (media & entertainment, government, etc.). While challenges exist such as data security concerns and the need for accurate data filtering amidst overwhelming information volume, technological advancements in AI-powered analytics and improved data visualization tools are mitigating these restraints. The competitive landscape is highly fragmented, with key players including Sony, Panasonic, JVC, Ikegami, Marshall, TVLogic, Canon, Planar, Lilliput, Blackmagic Design, and others. These companies are focusing on innovation and strategic partnerships to strengthen their market presence. The growth is expected to be geographically diverse, with North America and Europe holding significant market share initially, followed by a rise in adoption rates in Asia-Pacific and other regions driven by increasing internet penetration and digitalization. Continuous advancements in artificial intelligence and machine learning will further propel market growth over the forecast period. The strategic focus will likely shift towards enhancing the accuracy and efficiency of news tracking algorithms and providing more sophisticated analytics capabilities.

  10. Z

    DeepCube: Post-processing and annotated datasets of social media data

    • data.niaid.nih.gov
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandros Mokas (2024). DeepCube: Post-processing and annotated datasets of social media data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7732930
    Explore at:
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Giannis Tsampoulatidis
    Eleni Kamateri
    Alexandros Mokas
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Researcher(s): Alexandros Mokas, Eleni Kamateri

    Supervisor: Ioannis Tsampoulatidis

    This repository contains 3 social media datasets:

    2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:

    The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.

    The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

    1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:

    The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.

    For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.

    After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.

    The dataset is provided by INFALIA. INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.

  11. B

    Using Sentiment Analysis in assessing learner performance

    • borealisdata.ca
    • search.dataone.org
    Updated Oct 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Topps; Michelle Cullen; Corey Wirun (2022). Using Sentiment Analysis in assessing learner performance [Dataset]. http://doi.org/10.5683/SP3/IHUJUW
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2022
    Dataset provided by
    Borealis
    Authors
    David Topps; Michelle Cullen; Corey Wirun
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Sentiment analysis, first designed for asssessing short comments in social media and web sites, is now showing promise as a means to analyze the conversational fragments found in therapeutic conversations in nursing school. It provides a simple yet cost-effective overview of the discourse and associated sentiments or moods expressed. This was part of a TTalk conversational assessment project

  12. f

    Fast Fashion Tweets (Original and English Only + Sentiment Scores)

    • figshare.com
    txt
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sage Luong (2023). Fast Fashion Tweets (Original and English Only + Sentiment Scores) [Dataset]. http://doi.org/10.6084/m9.figshare.22273081.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    figshare
    Authors
    Sage Luong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our cleaned dataset with id strings of tweets containing "fast fashion" with only original, English tweets with sentiment scores for our Winter 2023 Digital Humanities 120: Social Media Data Analytics project at UCLA.

  13. o

    Using social media and personality traits to assess software developers'...

    • explore.openaire.eu
    Updated Jan 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leo Silva; Marília Gurgel Castro; Miriam Bernardino Silva; Milena Nestor Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira (2022). Using social media and personality traits to assess software developers' emotions [Dataset]. http://doi.org/10.5281/zenodo.7425721
    Explore at:
    Dataset updated
    Jan 1, 2022
    Authors
    Leo Silva; Marília Gurgel Castro; Miriam Bernardino Silva; Milena Nestor Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira
    Description

    Companion DATA Title: Using social media and personality traits to assess software developers' emotions Authors: Leo Moreira Silva Marília Gurgel Castro Miriam Bernardino Silva Milena Nestor Santos Uirá Kulesza Margarida Lima Henrique Madeira Journal: PeerJ Computer Science Github: https://github.com/leosilva/peerj_computer_science_2022 ------------------------------------------------------------ The folders contain: Experiment_Protocol.pdf: document that present the protocol regarding recruitment protocol, data collection of public posts from Twitter, criteria for manual analysis, and the assessment of Big Five factors from participants and psychologists. English version. /analysis analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications /dataset alldata.json: contains the dataset used in the paper /ethics_committee committee_response_english_version.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. English version. committee_response_original_portuguese_version: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. Portuguese version. committee_submission_form_english_version.pdf: the project submitted to the committee. English version. committee_submission_form_original_portuguese_version.pdf: the project submitted to the committee. Portuguese version. consent_form_english_version.pdf: declaration of free and informed consent fulfilled by participants. English version. consent_form_original_portuguese_version.pdf: declaration of free and informed consent fulfilled by participants. Portuguese version. data_protection_declaration_english_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. English version. data_protection_declaration_original_portuguese_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. Portuguese version. /notebooks General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period /surveys Demographic_Survey_english_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. English version. Demographic_Survey_portuguese_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. Portuguese version. Demographic_Survey_answers.xlsx: participants' demographic survey answers ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_en.doc: translation in English of the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_answers.xlsx: participantes' and psychologists' answers for BFI ------------------------------------------------------------ We have removed from dataset any sensible data to protect participants' privacy and anonymity. We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.

  14. Transliterated Marathi Dataset

    • kaggle.com
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gurunath Salve (2025). Transliterated Marathi Dataset [Dataset]. https://www.kaggle.com/datasets/gurunathsalve/transliterated-marathi-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gurunath Salve
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Transliterated Marathi Sentiment Analysis Dataset

    Overview

    This dataset is designed to facilitate sentiment analysis for transliterated Marathi text, which is widely used on social media platforms but lacks structured sentiment resources. The dataset includes user-generated comments labeled with sentiment scores, along with a manually curated sentiment wordlist to aid classification.

    The comments were collected from platforms like Instagram, Twitter, and YouTube, where informal, code-mixed text is prevalent. Each sentence has been carefully annotated for sentiment by human reviewers to ensure label accuracy and consistency.

    Files in This Dataset

    1. marathi_comments.csv – Contains user-generated transliterated Marathi comments with their sentiment classification.
    2. marathi_wordlist.csv – A manually created wordlist that maps common transliterated Marathi words to sentiment scores.

    Dataset Details

    1. marathi_comments.csv

    This file contains sentences along with sentiment labels assigned during manual annotation.

    ColumnDescription
    SentenceTransliterated Marathi sentence
    Classified ScoreSentiment label (-3 to +3) based on manual annotation

    Sentiment Labeling Scale:

    ScoreSentiment Meaning
    +3Most Positive
    +2More Positive
    +1Positive
    0Neutral
    -1Negative
    -2More Negative
    -3Most Negative

    2. marathi_wordlist.csv

    This file contains a sentiment wordlist with predefined scores for commonly used transliterated Marathi words.

    ColumnDescription
    wordTransliterated Marathi word
    scoreSentiment score assigned to the word (-3 to +3)

    How to Use the Dataset

    • Train sentiment analysis models for transliterated Marathi text.
    • Enhance rule-based sentiment analysis using the sentiment wordlist.
    • Fine-tune transformer-based models like BERT, XLM-R, or multilingual LLMs.
    • Analyze sentiment trends in Marathi social media conversations.

    Potential Applications

    • Social Media Sentiment Analysis: Detecting public sentiment on various topics in Marathi.
    • Code-Mixed Text Processing: Improving NLP models for multilingual and transliterated text.
    • Low-Resource Language NLP: Expanding research for sentiment classification in underrepresented languages.

    Acknowledgments

    This dataset was curated as part of a research project in the Department of Electronics & Telecommunication Engineering at SCTR's Pune Institute of Computer Technology, Pune, India. We sincerely appreciate the efforts and contributions of our project group in dataset collection, annotation, and structuring.

    Contributors:
    - Siddhi Pardeshi
    - Gurunath Salve
    - Sayali Thakur
    - Mr. Rishikesh J. Sutar (Mentor)

    We would like to extend our gratitude to our institution for providing guidance and support throughout this research. By making this dataset publicly available, we aim to encourage further advancements in low-resource language processing and Marathi NLP research.

  15. S

    Sentiment Analysis Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Sentiment Analysis Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/sentiment-analysis-tools-1945674
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jul 16, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global sentiment analysis tools market is experiencing robust growth, driven by the increasing need for businesses to understand customer opinions and preferences across various channels. The market's expansion is fueled by the proliferation of social media, e-commerce reviews, and customer service interactions, all generating vast quantities of unstructured data. Companies are leveraging sentiment analysis to gain valuable insights into brand perception, product development, and customer satisfaction, leading to improved marketing strategies, enhanced customer experiences, and ultimately, increased profitability. The market is segmented by deployment (cloud, on-premise), by organization size (SMEs, large enterprises), and by industry (retail, healthcare, finance, etc.), each exhibiting unique growth trajectories. Key players like IBM, SAP, and Microsoft are heavily invested in this space, constantly innovating with advanced algorithms and AI-powered solutions to improve accuracy and efficiency. The competitive landscape is dynamic, characterized by both organic growth and strategic acquisitions, solidifying the market's position as a crucial technology for businesses navigating the complexities of the digital age. The forecast period (2025-2033) anticipates sustained growth, driven by technological advancements such as natural language processing (NLP) and machine learning (ML), enabling more accurate and nuanced sentiment analysis. However, challenges remain, including data privacy concerns, the need for multilingual capabilities, and the complexity of analyzing sarcasm or nuanced language. Addressing these challenges will be crucial for sustained market expansion. The increasing adoption of cloud-based solutions is expected to further fuel market growth due to scalability, cost-effectiveness, and accessibility. The integration of sentiment analysis with other technologies, such as business intelligence and CRM systems, will also contribute significantly to its overall market expansion. We project a continued strong CAGR, reflecting the ongoing demand and technological advancements in the field.

  16. e

    Epidemiology of Cohort Social Media, 2018-2019 - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Epidemiology of Cohort Social Media, 2018-2019 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c2e00bd4-1857-5318-ab8c-ce6f29169a61
    Explore at:
    Dataset updated
    Oct 21, 2023
    Description

    Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media.Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media. We are demonstrating collection, anonymisation and analysis of social media data from consenting participants in the Avon Longitudinal Study of Parents and Children. Initially we are studying Twitter use, and gathering data through the platforms API. Our software gathers social media posts and interactions from participants every few days, with datasets being stored under security ISO 27001 certification. Derived, depersonalised datasets can be made available to approved researchers, and we aim to provide a means to evaluate sentiment analysis methods against ground truth data.

  17. f

    Dataset for Goodreads investigation project

    • figshare.com
    xlsx
    Updated May 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mia Flinton (2023). Dataset for Goodreads investigation project [Dataset]. http://doi.org/10.6084/m9.figshare.23146826.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 24, 2023
    Dataset provided by
    figshare
    Authors
    Mia Flinton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Investigating Goodreads reviews to perform sentiment analysis and keyword extraction about popular books.

  18. R

    Queue2 Dataset

    • universe.roboflow.com
    zip
    Updated Nov 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artem (2021). Queue2 Dataset [Dataset]. https://universe.roboflow.com/artem-uqcva/queue2/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 29, 2021
    Dataset authored and provided by
    Artem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    0 Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Sentiment Analysis: The "queue2" model can be used to detect engagement and emotional expressions between people in a given setting. For instance, in scenarios like a business meeting or a social gathering, understanding expressions and body language may provide valuable insights.

    2. Safety Monitoring: The model can be utilized in safety systems such as CCTV monitoring, where identifying people’s interactions in a specific space can help to ensure public safety.

    3. Social Networking: This model can find utility in social network applications to tag friends in a photos based on their poses and interactions.

    4. Behavioral Study: In research fields, this model can help in studying people's behavior in group settings or identifying patterns in social interactions.

    5. Customer Experience Management: In retail or event settings, businesses can use this model for managing crowd, measuring customer satisfaction levels or improvising on customer experiences.

  19. Text Emotion Recognition

    • kaggle.com
    Updated Mar 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreejit Cheela (2023). Text Emotion Recognition [Dataset]. https://www.kaggle.com/shreejitcheela/text-emotion-recognition/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2023
    Dataset provided by
    Kaggle
    Authors
    Shreejit Cheela
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Emotions play a vital role in human communication, and detecting emotions from text data is a challenging task. The ability to automatically recognize emotions from text has many practical applications, such as in sentiment analysis, social media monitoring, and customer feedback analysis.

    In this project, we will discuss the working principle of a text emotion recognition model and its important terminologies. We will also provide a detailed description of the model architecture used and its training process. Finally, we will conclude by evaluating the model using confusion matrix and classification report. Here, in the "emotions" column 0: sad 1: happy

    slang.txt in Abbreviations step can be taken from: https://www.kaggle.com/datasets/mansis97/slangs

  20. Z

    underlying data for "PERCEIVE - ENGAGING THE PEOPLE": IS SOCIAL MEDIA...

    • data.niaid.nih.gov
    Updated Mar 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pareschi Luca (2021). underlying data for "PERCEIVE - ENGAGING THE PEOPLE": IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION? [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4573251
    Explore at:
    Dataset updated
    Mar 3, 2021
    Dataset provided by
    Barberio Vitaliano
    Pareschi Luca
    Area covered
    European Union
    Description

    README file

    Data Set Title: “PERCEIVE - ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”

    Data Set Authors:

    Vitaliano Barberio (Wirtschaftsuniversität Wien), ORCID http://orcid.org/0000-0002-2615-5006;

    Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;

    Data Set Contributors:

    Ines Kuric (Wirtschaftsuniversität Wien);

    Edoardo Mollona (Università di Bologna), ORCID http://orcid.org/0000-0001-9496-8618.

    Markus Höllerer (Wirtschaftsuniversität Wien); http://orcid.org/0000-0003-2509-2696

    Data Set Contact Person:

    Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;

    luca.pareschi@uniroma2.it .

    Data Set License: this data set is distributed under a Creative Commons Attribution (CC BY) 4.0 International license

    Publication Year: 2021

    Project Info: PERCEIVE (Perception and Evaluation of Regional and Cohesion Policies by Europeans and Identification with the Values of Europe), funded by European Union, Horizon 2020 Programme. Grant Agreement num. 693529; https://www.perceiveproject.eu/.

    Data set Contents

    The data set consists of:

    1 README file

    6 textual qualitative file saved in .txt format

    “stoplist_file_[nation].txt”

    12 textual quantitative file saved in .txt format

    “[source]-keys.txt”: 6 files

    2 excel quantitative files saved in .xlsx format

    “SentimentFB.xlsx”

    “topics_prevalence_and_clustering.xlsx”

    Data set Documentation

    Abstract

    This data set contains the underlying data of the paper “’ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”.

    Data openly available within this dataset are a subset of the two following data sets, which contains all the relevant data of Work Package 3 and Work Package 5 of PERCEIVE project:

    Data set: “PERCEIVE: WP3: Effectiveness of communication strategies of EU projects” https://doi.org/10.5281/zenodo.3371133

    Data set: “PERCEIVE: WP5: The multiplicity of shared meanings of EU and Cohesion Regional and Urban Policy at different discursive levels” https://doi.org/10.5281/zenodo.3371174

    For the paper we collected Facebook posts referred to EU CP policies. We don’t have the permission to share these data (as they are protected by copyright), but all the sources are described in Deliverable 5.2, which is public (see http://doi.org/10.6092/unibo/amsacta/5726 or http://doi.org/10.5281/zenodo.1318184). We analyzed the textual content of data to construct a database of discursive topics in Task5.4. Data set includes the results of topic modeling and of a sentiment analysis performed on the Facebook homepages of Local Management Authorities (LMA) of PERCEIVE case study regions.

    Content of the files:

    1 sub-folder, named “A_Stopword”, which contains all the stopword lists used for performing Topic Modeling. These are 6 .txt files, one for each language: Austrian, Italian, Polish, Romanian, Spanish, Swedish (“stoplist_file_[nation].txt”).

    1 sub-folder which contain the Topic Modeling results for Facebook profiles of the Local Managing Authorities for Austria, Italy, Poland, Romania, Spain, and Sweden (sub-folder “B_Facebook”, 12 .txt files). For each case, a file “[source]-keys.txt” lists the 100 most important words for each topic, while a file “[source]-composition.txt” details the topic composition of each textual source. These files were obtained through Mallet software[1].

    File “SentimentFB.xlsx” contains data regarding the sentiment analysis for contents on Facebook homepages of Local Managing Authorities. The first column indicates the country, as well as row labels (see below). Columns 2-21 indicate the number id of the topics for each topic model (national level). The three rightmost columns of the file represent respectively a) the name of the lexicon used to detect sentiment orientation (i.e. “VADER”); c) the average sentiment score for positive, neutral and average words for each lexicon and each country; and c) the sentiment score across all topics in a country.

    File “topics_prevalence_and_clustering.xlsx” contains data regarding the three clusters of topics analyzed in the paper. The first column represents the ID of each topic; the second column reports the cluster of each topic; the third and the fourth columns report the average prevalence of each topic (rows) in posts and comments, respectively. As these data refer to a regional case study, these columns refer the first region for each country; the sixth and the seventh columns report the average prevalence of each topic (rows) in posts and comments for the second region analyzed (only for those countries where we analyzed two regions); the eighth and ninth columns reports the average prevalence of topics and comments, respectively, for each country; and finally the tenth column reports the country to which data in the previous two columns are referred.

    [1] McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit."http://mallet.cs.umass.edu. 2002.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chanveer Singh (2017). Social Media Analyzing.ova [Dataset]. https://academictorrents.com/details/5c7d429c9991bf87fea35feef68889eada4a3425

Social Media Analyzing.ova

Explore at:
bittorrent(15408308736)Available download formats
Dataset updated
Mar 11, 2017
Dataset authored and provided by
Chanveer Singh
License

https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

Description

This is a project on Social Media Sentiment Analysis using Hortonworks Sandbox following the procedure provided at website. The default username and password is root and clickstream respectively. Any BI tool can be used but I recommend Tableau which can be downloaded from website. Any user can contact me at cmdude16@gmail.com for further guidance.

Search
Clear search
Close search
Google apps
Main menu