65 datasets found

a
Social Media Analyzing.ova
academictorrents.com
bittorrent
Updated Mar 11, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chanveer Singh (2017). Social Media Analyzing.ova [Dataset]. https://academictorrents.com/details/5c7d429c9991bf87fea35feef68889eada4a3425
Explore at:
bittorrent(15408308736)Available download formats
Dataset updated
Mar 11, 2017
Dataset authored and provided by
Chanveer Singh
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
This is a project on Social Media Sentiment Analysis using Hortonworks Sandbox following the procedure provided at website. The default username and password is root and clickstream respectively. Any BI tool can be used but I recommend Tableau which can be downloaded from website. Any user can contact me at cmdude16@gmail.com for further guidance.
Vietnamese Social Media Emotion Corpus
kaggle.com
Updated Dec 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minh Thanh (2022). Vietnamese Social Media Emotion Corpus [Dataset]. https://www.kaggle.com/datasets/hmthanh/vsmec
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 29, 2022
Dataset provided by
Kaggle
Authors
Minh Thanh
Area covered
Vietnam
Description
Emotion recognition is a higher approach or special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of sentiment analysis in which the result are depicted in more expressions like sadness, enjoyment, anger, disgust, fear and surprise. Emotion recognition plays a critical role in measuring brand value of a product by recognizing specific emotions of customers’ comments. In this study, we have achieved two targets. First and foremost, we built a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese which is a low-resource language in Natural Language Processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC. As a result, Convolutional Neural Network (CNN) model achieved the highest performance with 57.61% of F1-score.

Paper: Vong Ho, Duong Nguyen, Danh Nguyen, Linh Pham, Kiet Nguyen and Ngan Nguyen, Emotion Recognition for Vietnamese Social Media Text, 2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), October 11-13, 2019, Ha Noi, Vietnam. Link.

https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects
SMILE Twitter Emotion dataset
figshare.com
txt
Updated Apr 21, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen (2016). SMILE Twitter Emotion dataset [Dataset]. http://doi.org/10.6084/m9.figshare.3187909.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3187909.v2
Dataset updated
Apr 21, 2016
Dataset provided by
figshare
Authors
Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.
TRACES Sentiment Analysis Twitter Dataset
zenodo.org
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irina Temnikova; Irina Temnikova; Silvia Gargova; Silvia Gargova (2023). TRACES Sentiment Analysis Twitter Dataset [Dataset]. http://doi.org/10.5281/zenodo.7357386
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7357386
Dataset updated
Oct 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Irina Temnikova; Irina Temnikova; Silvia Gargova; Silvia Gargova
Description
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 1810 unique tweet IDs, written in Bulgarian, with annotations (positive, negative, neutral). The tweets are on the topics of lies, manipulation, and Covid-19 and are a subset of the following datasets:

https://zenodo.org/record/7296865

https://zenodo.org/record/7296736

https://zenodo.org/record/7296877

The tweets have been collected via Twitter API under academic access between 1 Jan 2020 - 28 June 2022 and thus cannot be used for commercial purposes.
The dUCk Tweets
kaggle.com
Updated Aug 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlson (2020). The dUCk Tweets [Dataset]. https://www.kaggle.com/carlsonhoo/the-duck-tweets/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carlson
Description
Objective

The datasets were downloaded from Twitter by using getOldTweets3 in order to analyze the public sentiment toward the brand. The tweets started from Jan 2019 until end of June 2020. The tweets were downloaded by using 2 keywords, "Vivy duck", "Vivy" is refer to the Brand Owner Vivy Yusof and "duck" is refer to the brand name The dUCk group. The original tweets are mixed with English and Malay languages.

Brand

Founded by popular blogger cum entrepreneur Vivy Yusof, dUCk launched in May 2014, and was born out of the love for well-branded scarves, aiming to convey the message that wearing scarves should be a celebrated act among women. The dUCk brand which revolves around a character named D, rose quickly in its popularity across the world, and has since expanded to become The dUCk Group. The dUCk Group today comprises of 5 main product lines – Scarves, Cosmetics, Stationeries, Bags, and Home & Living.

Since MCO is implied due to Covid-19, the brand received quite a backlash on Twitter and reached its peak in April 2020. Thus, it is interesting to find out public sentiment on Twitter toward the owner “Vivy” and the brand, “dUCk” to get an insight of the image and how it affected the brand.

Acknowledgements

The study is only for academic purposes, to understand how the phenomena on social media can change the public sentiment toward the brand. Photo by ONNE Beauty

Inspiration

The reason why the brand was picked because we're interested to see how the sentiment changed especially there were 2 incident happened to the brand in Jan 2020 and April 2020

Files

raw_tweets_012019_to_062020.csv: Complete Raw data with mixture of English and Malay Tweets

tweets_012019_to_062020_translated.csv: Complete set of tweets that translated to English Only by Google Translate

training.csv: Original Tweets (Without Translation to English) that manually labelled the sentiment polarity
Shein Tweets (Original and English Only + Sentiment Scores)
figshare.com
txt
Updated Mar 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sage Luong (2023). Shein Tweets (Original and English Only + Sentiment Scores) [Dataset]. http://doi.org/10.6084/m9.figshare.22273084.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22273084.v1
Dataset updated
Mar 15, 2023
Dataset provided by
figshare
Authors
Sage Luong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our cleaned dataset with id strings of tweets containing "shein" with only original, English tweets plus sentiment scores for our Winter 2023 Digital Humanities 120: Social Media Data Analytics project at UCLA.
DeepSeek
figshare.com
json
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zeqin lin (2025). DeepSeek [Dataset]. http://doi.org/10.6084/m9.figshare.29377388.v2
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29377388.v2
Dataset updated
Jun 22, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
zeqin lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This project contains data, analysis, and insights derived from discussions about DeepSeek technology on Weibo. The study aims to understand public sentiment and key discussion topics related to DeepSeek technology using Natural Language Processing (NLP) techniques such as topic modeling and sentiment analysis.
c
Digital Phenotyping via Social Media Content 2
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Feb 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SK Kavvadias (2024). Digital Phenotyping via Social Media Content 2 [Dataset]. http://doi.org/10.17026/dans-z7g-9wek
Explore at:
Unique identifier
https://doi.org/10.17026/dans-z7g-9wek
Dataset updated
Feb 10, 2024
Dataset provided by
RMIT University: Melbourne, Victoria, AU
Authors
SK Kavvadias
Description
The research project associated with this dataset focuses on the analysis of the top threads within the ddo subreddit. The dataset contains essential information about each of these threads, including the author's username, the post's title, the post text, its score, and the number of comments it has received. Additionally, it includes a detailed record of all comments within each thread, encompassing the commenter's username, the date and time of their comment, and the score received by each comment.
The purpose of this project is to recognize addicted users within the ddo subreddit community by considering their activity patterns, emotional expressions, and content preferences, ultimately contributing to a deeper understanding of addiction-related behaviors in online communities and informing strategies for tailored support and interventions.

Date Submitted: 2023-09-19
O
Online News Tracking Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jul 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Online News Tracking Report [Dataset]. https://www.archivemarketresearch.com/reports/online-news-tracking-562601
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jul 20, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The online news tracking market is experiencing robust growth, driven by the increasing demand for real-time information and the proliferation of digital news sources. Our analysis projects a market size of $15 billion in 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This significant expansion is fueled by several key factors. The rise of social media and its impact on news dissemination necessitates efficient tracking solutions. Furthermore, the need for brand monitoring, sentiment analysis, and competitive intelligence analysis within the rapidly evolving digital landscape is driving adoption. Government agencies and media organizations are also major contributors to market growth, as they rely on real-time news monitoring for crisis management, public safety, and strategic decision-making. The market is segmented by software type (cloud-based vs. on-premise), deployment mode (web-based vs. mobile), organization size (SMEs vs. large enterprises), and end-use industry (media & entertainment, government, etc.). While challenges exist such as data security concerns and the need for accurate data filtering amidst overwhelming information volume, technological advancements in AI-powered analytics and improved data visualization tools are mitigating these restraints. The competitive landscape is highly fragmented, with key players including Sony, Panasonic, JVC, Ikegami, Marshall, TVLogic, Canon, Planar, Lilliput, Blackmagic Design, and others. These companies are focusing on innovation and strategic partnerships to strengthen their market presence. The growth is expected to be geographically diverse, with North America and Europe holding significant market share initially, followed by a rise in adoption rates in Asia-Pacific and other regions driven by increasing internet penetration and digitalization. Continuous advancements in artificial intelligence and machine learning will further propel market growth over the forecast period. The strategic focus will likely shift towards enhancing the accuracy and efficiency of news tracking algorithms and providing more sophisticated analytics capabilities.
Z
DeepCube: Post-processing and annotated datasets of social media data
data.niaid.nih.gov
Updated Mar 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandros Mokas (2024). DeepCube: Post-processing and annotated datasets of social media data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7732930
Explore at:
Dataset updated
Mar 15, 2024
Dataset provided by
Giannis Tsampoulatidis
Eleni Kamateri
Alexandros Mokas
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Researcher(s): Alexandros Mokas, Eleni Kamateri

Supervisor: Ioannis Tsampoulatidis

This repository contains 3 social media datasets:

2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:

The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.

The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:

The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.

For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.

After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.

The dataset is provided by INFALIA. INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.
B
Using Sentiment Analysis in assessing learner performance
borealisdata.ca
search.dataone.org
Updated Oct 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Topps; Michelle Cullen; Corey Wirun (2022). Using Sentiment Analysis in assessing learner performance [Dataset]. http://doi.org/10.5683/SP3/IHUJUW
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/IHUJUW
Dataset updated
Oct 21, 2022
Dataset provided by
Borealis
Authors
David Topps; Michelle Cullen; Corey Wirun
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Sentiment analysis, first designed for asssessing short comments in social media and web sites, is now showing promise as a means to analyze the conversational fragments found in therapeutic conversations in nursing school. It provides a simple yet cost-effective overview of the discourse and associated sentiments or moods expressed. This was part of a TTalk conversational assessment project
f
Fast Fashion Tweets (Original and English Only + Sentiment Scores)
figshare.com
txt
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sage Luong (2023). Fast Fashion Tweets (Original and English Only + Sentiment Scores) [Dataset]. http://doi.org/10.6084/m9.figshare.22273081.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22273081.v1
Dataset updated
Mar 15, 2023
Dataset provided by
figshare
Authors
Sage Luong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our cleaned dataset with id strings of tweets containing "fast fashion" with only original, English tweets with sentiment scores for our Winter 2023 Digital Humanities 120: Social Media Data Analytics project at UCLA.
o
Using social media and personality traits to assess software developers'...
explore.openaire.eu
Updated Jan 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leo Silva; Marília Gurgel Castro; Miriam Bernardino Silva; Milena Nestor Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira (2022). Using social media and personality traits to assess software developers' emotions [Dataset]. http://doi.org/10.5281/zenodo.7425721
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7425721
Dataset updated
Jan 1, 2022
Authors
Leo Silva; Marília Gurgel Castro; Miriam Bernardino Silva; Milena Nestor Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira
Description
Companion DATA Title: Using social media and personality traits to assess software developers' emotions Authors: Leo Moreira Silva Marília Gurgel Castro Miriam Bernardino Silva Milena Nestor Santos Uirá Kulesza Margarida Lima Henrique Madeira Journal: PeerJ Computer Science Github: https://github.com/leosilva/peerj_computer_science_2022 ------------------------------------------------------------ The folders contain: Experiment_Protocol.pdf: document that present the protocol regarding recruitment protocol, data collection of public posts from Twitter, criteria for manual analysis, and the assessment of Big Five factors from participants and psychologists. English version. /analysis analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications /dataset alldata.json: contains the dataset used in the paper /ethics_committee committee_response_english_version.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. English version. committee_response_original_portuguese_version: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. Portuguese version. committee_submission_form_english_version.pdf: the project submitted to the committee. English version. committee_submission_form_original_portuguese_version.pdf: the project submitted to the committee. Portuguese version. consent_form_english_version.pdf: declaration of free and informed consent fulfilled by participants. English version. consent_form_original_portuguese_version.pdf: declaration of free and informed consent fulfilled by participants. Portuguese version. data_protection_declaration_english_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. English version. data_protection_declaration_original_portuguese_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. Portuguese version. /notebooks General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period /surveys Demographic_Survey_english_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. English version. Demographic_Survey_portuguese_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. Portuguese version. Demographic_Survey_answers.xlsx: participants' demographic survey answers ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_en.doc: translation in English of the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_answers.xlsx: participantes' and psychologists' answers for BFI ------------------------------------------------------------ We have removed from dataset any sensible data to protect participants' privacy and anonymity. We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.
Transliterated Marathi Dataset
kaggle.com
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gurunath Salve (2025). Transliterated Marathi Dataset [Dataset]. https://www.kaggle.com/datasets/gurunathsalve/transliterated-marathi-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gurunath Salve
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Transliterated Marathi Sentiment Analysis Dataset

Overview

This dataset is designed to facilitate sentiment analysis for transliterated Marathi text, which is widely used on social media platforms but lacks structured sentiment resources. The dataset includes user-generated comments labeled with sentiment scores, along with a manually curated sentiment wordlist to aid classification.

The comments were collected from platforms like Instagram, Twitter, and YouTube, where informal, code-mixed text is prevalent. Each sentence has been carefully annotated for sentiment by human reviewers to ensure label accuracy and consistency.

Files in This Dataset

marathi_comments.csv – Contains user-generated transliterated Marathi comments with their sentiment classification.

marathi_wordlist.csv – A manually created wordlist that maps common transliterated Marathi words to sentiment scores.

Dataset Details

1. marathi_comments.csv

This file contains sentences along with sentiment labels assigned during manual annotation.

Column Description
Sentence Transliterated Marathi sentence
Classified Score Sentiment label (-3 to +3) based on manual annotation

Sentiment Labeling Scale:

Score Sentiment Meaning
+3 Most Positive
+2 More Positive
+1 Positive
0 Neutral
-1 Negative
-2 More Negative
-3 Most Negative

2. marathi_wordlist.csv

This file contains a sentiment wordlist with predefined scores for commonly used transliterated Marathi words.

Column Description
word Transliterated Marathi word
score Sentiment score assigned to the word (-3 to +3)

How to Use the Dataset

Train sentiment analysis models for transliterated Marathi text.

Enhance rule-based sentiment analysis using the sentiment wordlist.

Fine-tune transformer-based models like BERT, XLM-R, or multilingual LLMs.

Analyze sentiment trends in Marathi social media conversations.

Potential Applications

Social Media Sentiment Analysis: Detecting public sentiment on various topics in Marathi.

Code-Mixed Text Processing: Improving NLP models for multilingual and transliterated text.

Low-Resource Language NLP: Expanding research for sentiment classification in underrepresented languages.

Acknowledgments

This dataset was curated as part of a research project in the Department of Electronics & Telecommunication Engineering at SCTR's Pune Institute of Computer Technology, Pune, India. We sincerely appreciate the efforts and contributions of our project group in dataset collection, annotation, and structuring.

Contributors:
- Siddhi Pardeshi
- Gurunath Salve
- Sayali Thakur
- Mr. Rishikesh J. Sutar (Mentor)

We would like to extend our gratitude to our institution for providing guidance and support throughout this research. By making this dataset publicly available, we aim to encourage further advancements in low-resource language processing and Marathi NLP research.
S
Sentiment Analysis Tools Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Sentiment Analysis Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/sentiment-analysis-tools-1945674
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jul 16, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global sentiment analysis tools market is experiencing robust growth, driven by the increasing need for businesses to understand customer opinions and preferences across various channels. The market's expansion is fueled by the proliferation of social media, e-commerce reviews, and customer service interactions, all generating vast quantities of unstructured data. Companies are leveraging sentiment analysis to gain valuable insights into brand perception, product development, and customer satisfaction, leading to improved marketing strategies, enhanced customer experiences, and ultimately, increased profitability. The market is segmented by deployment (cloud, on-premise), by organization size (SMEs, large enterprises), and by industry (retail, healthcare, finance, etc.), each exhibiting unique growth trajectories. Key players like IBM, SAP, and Microsoft are heavily invested in this space, constantly innovating with advanced algorithms and AI-powered solutions to improve accuracy and efficiency. The competitive landscape is dynamic, characterized by both organic growth and strategic acquisitions, solidifying the market's position as a crucial technology for businesses navigating the complexities of the digital age. The forecast period (2025-2033) anticipates sustained growth, driven by technological advancements such as natural language processing (NLP) and machine learning (ML), enabling more accurate and nuanced sentiment analysis. However, challenges remain, including data privacy concerns, the need for multilingual capabilities, and the complexity of analyzing sarcasm or nuanced language. Addressing these challenges will be crucial for sustained market expansion. The increasing adoption of cloud-based solutions is expected to further fuel market growth due to scalability, cost-effectiveness, and accessibility. The integration of sentiment analysis with other technologies, such as business intelligence and CRM systems, will also contribute significantly to its overall market expansion. We project a continued strong CAGR, reflecting the ongoing demand and technological advancements in the field.
e
Epidemiology of Cohort Social Media, 2018-2019 - Dataset - B2FIND
b2find.eudat.eu
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Epidemiology of Cohort Social Media, 2018-2019 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c2e00bd4-1857-5318-ab8c-ce6f29169a61
Explore at:
Dataset updated
Oct 21, 2023
Description
Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media.Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media. We are demonstrating collection, anonymisation and analysis of social media data from consenting participants in the Avon Longitudinal Study of Parents and Children. Initially we are studying Twitter use, and gathering data through the platforms API. Our software gathers social media posts and interactions from participants every few days, with datasets being stored under security ISO 27001 certification. Derived, depersonalised datasets can be made available to approved researchers, and we aim to provide a means to evaluate sentiment analysis methods against ground truth data.
f
Dataset for Goodreads investigation project
figshare.com
xlsx
Updated May 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mia Flinton (2023). Dataset for Goodreads investigation project [Dataset]. http://doi.org/10.6084/m9.figshare.23146826.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23146826.v1
Dataset updated
May 24, 2023
Dataset provided by
figshare
Authors
Mia Flinton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Investigating Goodreads reviews to perform sentiment analysis and keyword extraction about popular books.
R
Queue2 Dataset
universe.roboflow.com
zip
Updated Nov 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artem (2021). Queue2 Dataset [Dataset]. https://universe.roboflow.com/artem-uqcva/queue2/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Nov 29, 2021
Dataset authored and provided by
Artem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
0 Bounding Boxes
Description
Here are a few use cases for this project:

Sentiment Analysis: The "queue2" model can be used to detect engagement and emotional expressions between people in a given setting. For instance, in scenarios like a business meeting or a social gathering, understanding expressions and body language may provide valuable insights.

Safety Monitoring: The model can be utilized in safety systems such as CCTV monitoring, where identifying people’s interactions in a specific space can help to ensure public safety.

Social Networking: This model can find utility in social network applications to tag friends in a photos based on their poses and interactions.

Behavioral Study: In research fields, this model can help in studying people's behavior in group settings or identifying patterns in social interactions.

Customer Experience Management: In retail or event settings, businesses can use this model for managing crowd, measuring customer satisfaction levels or improvising on customer experiences.
Text Emotion Recognition
kaggle.com
Updated Mar 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreejit Cheela (2023). Text Emotion Recognition [Dataset]. https://www.kaggle.com/shreejitcheela/text-emotion-recognition/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2023
Dataset provided by
Kaggle
Authors
Shreejit Cheela
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Emotions play a vital role in human communication, and detecting emotions from text data is a challenging task. The ability to automatically recognize emotions from text has many practical applications, such as in sentiment analysis, social media monitoring, and customer feedback analysis.

In this project, we will discuss the working principle of a text emotion recognition model and its important terminologies. We will also provide a detailed description of the model architecture used and its training process. Finally, we will conclude by evaluating the model using confusion matrix and classification report. Here, in the "emotions" column 0: sad 1: happy

slang.txt in Abbreviations step can be taken from: https://www.kaggle.com/datasets/mansis97/slangs
Z
underlying data for "PERCEIVE - ENGAGING THE PEOPLE": IS SOCIAL MEDIA...
data.niaid.nih.gov
Updated Mar 3, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pareschi Luca (2021). underlying data for "PERCEIVE - ENGAGING THE PEOPLE": IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION? [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4573251
Explore at:
Dataset updated
Mar 3, 2021
Dataset provided by
Barberio Vitaliano
Pareschi Luca
Area covered
European Union
Description
README file

Data Set Title: “PERCEIVE - ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”

Data Set Authors:

Vitaliano Barberio (Wirtschaftsuniversität Wien), ORCID http://orcid.org/0000-0002-2615-5006;

Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;

Data Set Contributors:

Ines Kuric (Wirtschaftsuniversität Wien);

Edoardo Mollona (Università di Bologna), ORCID http://orcid.org/0000-0001-9496-8618.

Markus Höllerer (Wirtschaftsuniversität Wien); http://orcid.org/0000-0003-2509-2696

Data Set Contact Person:

Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;

luca.pareschi@uniroma2.it .

Data Set License: this data set is distributed under a Creative Commons Attribution (CC BY) 4.0 International license

Publication Year: 2021

Project Info: PERCEIVE (Perception and Evaluation of Regional and Cohesion Policies by Europeans and Identification with the Values of Europe), funded by European Union, Horizon 2020 Programme. Grant Agreement num. 693529; https://www.perceiveproject.eu/.

Data set Contents

The data set consists of:

1 README file

6 textual qualitative file saved in .txt format

“stoplist_file_[nation].txt”

12 textual quantitative file saved in .txt format

“[source]-keys.txt”: 6 files

2 excel quantitative files saved in .xlsx format

“SentimentFB.xlsx”

“topics_prevalence_and_clustering.xlsx”

Data set Documentation

Abstract

This data set contains the underlying data of the paper “’ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”.

Data openly available within this dataset are a subset of the two following data sets, which contains all the relevant data of Work Package 3 and Work Package 5 of PERCEIVE project:

Data set: “PERCEIVE: WP3: Effectiveness of communication strategies of EU projects” https://doi.org/10.5281/zenodo.3371133

Data set: “PERCEIVE: WP5: The multiplicity of shared meanings of EU and Cohesion Regional and Urban Policy at different discursive levels” https://doi.org/10.5281/zenodo.3371174

For the paper we collected Facebook posts referred to EU CP policies. We don’t have the permission to share these data (as they are protected by copyright), but all the sources are described in Deliverable 5.2, which is public (see http://doi.org/10.6092/unibo/amsacta/5726 or http://doi.org/10.5281/zenodo.1318184). We analyzed the textual content of data to construct a database of discursive topics in Task5.4. Data set includes the results of topic modeling and of a sentiment analysis performed on the Facebook homepages of Local Management Authorities (LMA) of PERCEIVE case study regions.

Content of the files:

1 sub-folder, named “A_Stopword”, which contains all the stopword lists used for performing Topic Modeling. These are 6 .txt files, one for each language: Austrian, Italian, Polish, Romanian, Spanish, Swedish (“stoplist_file_[nation].txt”).

1 sub-folder which contain the Topic Modeling results for Facebook profiles of the Local Managing Authorities for Austria, Italy, Poland, Romania, Spain, and Sweden (sub-folder “B_Facebook”, 12 .txt files). For each case, a file “[source]-keys.txt” lists the 100 most important words for each topic, while a file “[source]-composition.txt” details the topic composition of each textual source. These files were obtained through Mallet software[1].

File “SentimentFB.xlsx” contains data regarding the sentiment analysis for contents on Facebook homepages of Local Managing Authorities. The first column indicates the country, as well as row labels (see below). Columns 2-21 indicate the number id of the topics for each topic model (national level). The three rightmost columns of the file represent respectively a) the name of the lexicon used to detect sentiment orientation (i.e. “VADER”); c) the average sentiment score for positive, neutral and average words for each lexicon and each country; and c) the sentiment score across all topics in a country.

File “topics_prevalence_and_clustering.xlsx” contains data regarding the three clusters of topics analyzed in the paper. The first column represents the ID of each topic; the second column reports the cluster of each topic; the third and the fourth columns report the average prevalence of each topic (rows) in posts and comments, respectively. As these data refer to a regional case study, these columns refer the first region for each country; the sixth and the seventh columns report the average prevalence of each topic (rows) in posts and comments for the second region analyzed (only for those countries where we analyzed two regions); the eighth and ninth columns reports the average prevalence of topics and comments, respectively, for each country; and finally the tenth column reports the country to which data in the previous two columns are referred.

[1] McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit."http://mallet.cs.umass.edu. 2002.

Column	Description
`Sentence`	Transliterated Marathi sentence
`Classified Score`	Sentiment label (-3 to +3) based on manual annotation

Score	Sentiment Meaning
+3	Most Positive
+2	More Positive
+1	Positive
0	Neutral
-1	Negative
-2	More Negative
-3	Most Negative

Column	Description
`word`	Transliterated Marathi word
`score`	Sentiment score assigned to the word (-3 to +3)

Facebook

Twitter

Click to copy link

Link copied

Cite

Chanveer Singh (2017). Social Media Analyzing.ova [Dataset]. https://academictorrents.com/details/5c7d429c9991bf87fea35feef68889eada4a3425

Social Media Analyzing.ova

Explore at:

bittorrent(15408308736)Available download formats

Dataset updated

Mar 11, 2017

Dataset authored and provided by

Chanveer Singh

License

https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

Description

This is a project on Social Media Sentiment Analysis using Hortonworks Sandbox following the procedure provided at website. The default username and password is root and clickstream respectively. Any BI tool can be used but I recommend Tableau which can be downloaded from website. Any user can contact me at cmdude16@gmail.com for further guidance.

Clear search

Close search

Google apps

Main menu

Social Media Analyzing.ova

Vietnamese Social Media Emotion Corpus

SMILE Twitter Emotion dataset

TRACES Sentiment Analysis Twitter Dataset

The dUCk Tweets

Objective

Brand

Acknowledgements

Inspiration

Files

Shein Tweets (Original and English Only + Sentiment Scores)

DeepSeek

Digital Phenotyping via Social Media Content 2

Online News Tracking Report

DeepCube: Post-processing and annotated datasets of social media data

Using Sentiment Analysis in assessing learner performance

Fast Fashion Tweets (Original and English Only + Sentiment Scores)

Using social media and personality traits to assess software developers'...

Transliterated Marathi Dataset

Transliterated Marathi Sentiment Analysis Dataset

Overview

Files in This Dataset

Dataset Details

1. marathi_comments.csv

2. marathi_wordlist.csv

How to Use the Dataset

Potential Applications

Acknowledgments

Sentiment Analysis Tools Report

Epidemiology of Cohort Social Media, 2018-2019 - Dataset - B2FIND

Dataset for Goodreads investigation project

Queue2 Dataset

Text Emotion Recognition

underlying data for "PERCEIVE - ENGAGING THE PEOPLE": IS SOCIAL MEDIA...

Social Media Analyzing.ova