https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This is a project on Social Media Sentiment Analysis using Hortonworks Sandbox following the procedure provided at website. The default username and password is root and clickstream respectively. Any BI tool can be used but I recommend Tableau which can be downloaded from website. Any user can contact me at cmdude16@gmail.com for further guidance.
Emotion recognition is a higher approach or special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of sentiment analysis in which the result are depicted in more expressions like sadness, enjoyment, anger, disgust, fear and surprise. Emotion recognition plays a critical role in measuring brand value of a product by recognizing specific emotions of customers’ comments. In this study, we have achieved two targets. First and foremost, we built a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese which is a low-resource language in Natural Language Processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC. As a result, Convolutional Neural Network (CNN) model achieved the highest performance with 57.61% of F1-score.
Paper: Vong Ho, Duong Nguyen, Danh Nguyen, Linh Pham, Kiet Nguyen and Ngan Nguyen, Emotion Recognition for Vietnamese Social Media Text, 2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), October 11-13, 2019, Ha Noi, Vietnam. Link.
https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 1810 unique tweet IDs, written in Bulgarian, with annotations (positive, negative, neutral). The tweets are on the topics of lies, manipulation, and Covid-19 and are a subset of the following datasets:
https://zenodo.org/record/7296865
https://zenodo.org/record/7296736
https://zenodo.org/record/7296877
The tweets have been collected via Twitter API under academic access between 1 Jan 2020 - 28 June 2022 and thus cannot be used for commercial purposes.
The datasets were downloaded from Twitter by using getOldTweets3 in order to analyze the public sentiment toward the brand. The tweets started from Jan 2019 until end of June 2020. The tweets were downloaded by using 2 keywords, "Vivy duck", "Vivy" is refer to the Brand Owner Vivy Yusof and "duck" is refer to the brand name The dUCk group. The original tweets are mixed with English and Malay languages.
Founded by popular blogger cum entrepreneur Vivy Yusof, dUCk launched in May 2014, and was born out of the love for well-branded scarves, aiming to convey the message that wearing scarves should be a celebrated act among women. The dUCk brand which revolves around a character named D, rose quickly in its popularity across the world, and has since expanded to become The dUCk Group. The dUCk Group today comprises of 5 main product lines – Scarves, Cosmetics, Stationeries, Bags, and Home & Living.
Since MCO is implied due to Covid-19, the brand received quite a backlash on Twitter and reached its peak in April 2020. Thus, it is interesting to find out public sentiment on Twitter toward the owner “Vivy” and the brand, “dUCk” to get an insight of the image and how it affected the brand.
The study is only for academic purposes, to understand how the phenomena on social media can change the public sentiment toward the brand. Photo by ONNE Beauty
The reason why the brand was picked because we're interested to see how the sentiment changed especially there were 2 incident happened to the brand in Jan 2020 and April 2020
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our cleaned dataset with id strings of tweets containing "shein" with only original, English tweets plus sentiment scores for our Winter 2023 Digital Humanities 120: Social Media Data Analytics project at UCLA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project contains data, analysis, and insights derived from discussions about DeepSeek technology on Weibo. The study aims to understand public sentiment and key discussion topics related to DeepSeek technology using Natural Language Processing (NLP) techniques such as topic modeling and sentiment analysis.
The research project associated with this dataset focuses on the analysis of the top threads within the ddo subreddit. The dataset contains essential information about each of these threads, including the author's username, the post's title, the post text, its score, and the number of comments it has received. Additionally, it includes a detailed record of all comments within each thread, encompassing the commenter's username, the date and time of their comment, and the score received by each comment.
The purpose of this project is to recognize addicted users within the ddo subreddit community by considering their activity patterns, emotional expressions, and content preferences, ultimately contributing to a deeper understanding of addiction-related behaviors in online communities and informing strategies for tailored support and interventions.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The online news tracking market is experiencing robust growth, driven by the increasing demand for real-time information and the proliferation of digital news sources. Our analysis projects a market size of $15 billion in 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This significant expansion is fueled by several key factors. The rise of social media and its impact on news dissemination necessitates efficient tracking solutions. Furthermore, the need for brand monitoring, sentiment analysis, and competitive intelligence analysis within the rapidly evolving digital landscape is driving adoption. Government agencies and media organizations are also major contributors to market growth, as they rely on real-time news monitoring for crisis management, public safety, and strategic decision-making. The market is segmented by software type (cloud-based vs. on-premise), deployment mode (web-based vs. mobile), organization size (SMEs vs. large enterprises), and end-use industry (media & entertainment, government, etc.). While challenges exist such as data security concerns and the need for accurate data filtering amidst overwhelming information volume, technological advancements in AI-powered analytics and improved data visualization tools are mitigating these restraints. The competitive landscape is highly fragmented, with key players including Sony, Panasonic, JVC, Ikegami, Marshall, TVLogic, Canon, Planar, Lilliput, Blackmagic Design, and others. These companies are focusing on innovation and strategic partnerships to strengthen their market presence. The growth is expected to be geographically diverse, with North America and Europe holding significant market share initially, followed by a rise in adoption rates in Asia-Pacific and other regions driven by increasing internet penetration and digitalization. Continuous advancements in artificial intelligence and machine learning will further propel market growth over the forecast period. The strategic focus will likely shift towards enhancing the accuracy and efficiency of news tracking algorithms and providing more sophisticated analytics capabilities.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Researcher(s): Alexandros Mokas, Eleni Kamateri
Supervisor: Ioannis Tsampoulatidis
This repository contains 3 social media datasets:
2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:
The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.
The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.
1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:
The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.
For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.
After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.
The dataset is provided by INFALIA. INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Sentiment analysis, first designed for asssessing short comments in social media and web sites, is now showing promise as a means to analyze the conversational fragments found in therapeutic conversations in nursing school. It provides a simple yet cost-effective overview of the discourse and associated sentiments or moods expressed. This was part of a TTalk conversational assessment project
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our cleaned dataset with id strings of tweets containing "fast fashion" with only original, English tweets with sentiment scores for our Winter 2023 Digital Humanities 120: Social Media Data Analytics project at UCLA.
Companion DATA Title: Using social media and personality traits to assess software developers' emotions Authors: Leo Moreira Silva Marília Gurgel Castro Miriam Bernardino Silva Milena Nestor Santos Uirá Kulesza Margarida Lima Henrique Madeira Journal: PeerJ Computer Science Github: https://github.com/leosilva/peerj_computer_science_2022 ------------------------------------------------------------ The folders contain: Experiment_Protocol.pdf: document that present the protocol regarding recruitment protocol, data collection of public posts from Twitter, criteria for manual analysis, and the assessment of Big Five factors from participants and psychologists. English version. /analysis analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications /dataset alldata.json: contains the dataset used in the paper /ethics_committee committee_response_english_version.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. English version. committee_response_original_portuguese_version: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. Portuguese version. committee_submission_form_english_version.pdf: the project submitted to the committee. English version. committee_submission_form_original_portuguese_version.pdf: the project submitted to the committee. Portuguese version. consent_form_english_version.pdf: declaration of free and informed consent fulfilled by participants. English version. consent_form_original_portuguese_version.pdf: declaration of free and informed consent fulfilled by participants. Portuguese version. data_protection_declaration_english_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. English version. data_protection_declaration_original_portuguese_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. Portuguese version. /notebooks General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period /surveys Demographic_Survey_english_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. English version. Demographic_Survey_portuguese_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. Portuguese version. Demographic_Survey_answers.xlsx: participants' demographic survey answers ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_en.doc: translation in English of the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_answers.xlsx: participantes' and psychologists' answers for BFI ------------------------------------------------------------ We have removed from dataset any sensible data to protect participants' privacy and anonymity. We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed to facilitate sentiment analysis for transliterated Marathi text, which is widely used on social media platforms but lacks structured sentiment resources. The dataset includes user-generated comments labeled with sentiment scores, along with a manually curated sentiment wordlist to aid classification.
The comments were collected from platforms like Instagram, Twitter, and YouTube, where informal, code-mixed text is prevalent. Each sentence has been carefully annotated for sentiment by human reviewers to ensure label accuracy and consistency.
marathi_comments.csv
– Contains user-generated transliterated Marathi comments with their sentiment classification. marathi_wordlist.csv
– A manually created wordlist that maps common transliterated Marathi words to sentiment scores. This file contains sentences along with sentiment labels assigned during manual annotation.
Column | Description |
---|---|
Sentence | Transliterated Marathi sentence |
Classified Score | Sentiment label (-3 to +3) based on manual annotation |
Sentiment Labeling Scale:
Score | Sentiment Meaning |
---|---|
+3 | Most Positive |
+2 | More Positive |
+1 | Positive |
0 | Neutral |
-1 | Negative |
-2 | More Negative |
-3 | Most Negative |
This file contains a sentiment wordlist with predefined scores for commonly used transliterated Marathi words.
Column | Description |
---|---|
word | Transliterated Marathi word |
score | Sentiment score assigned to the word (-3 to +3) |
This dataset was curated as part of a research project in the Department of Electronics & Telecommunication Engineering at SCTR's Pune Institute of Computer Technology, Pune, India. We sincerely appreciate the efforts and contributions of our project group in dataset collection, annotation, and structuring.
Contributors:
- Siddhi Pardeshi
- Gurunath Salve
- Sayali Thakur
- Mr. Rishikesh J. Sutar (Mentor)
We would like to extend our gratitude to our institution for providing guidance and support throughout this research. By making this dataset publicly available, we aim to encourage further advancements in low-resource language processing and Marathi NLP research.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global sentiment analysis tools market is experiencing robust growth, driven by the increasing need for businesses to understand customer opinions and preferences across various channels. The market's expansion is fueled by the proliferation of social media, e-commerce reviews, and customer service interactions, all generating vast quantities of unstructured data. Companies are leveraging sentiment analysis to gain valuable insights into brand perception, product development, and customer satisfaction, leading to improved marketing strategies, enhanced customer experiences, and ultimately, increased profitability. The market is segmented by deployment (cloud, on-premise), by organization size (SMEs, large enterprises), and by industry (retail, healthcare, finance, etc.), each exhibiting unique growth trajectories. Key players like IBM, SAP, and Microsoft are heavily invested in this space, constantly innovating with advanced algorithms and AI-powered solutions to improve accuracy and efficiency. The competitive landscape is dynamic, characterized by both organic growth and strategic acquisitions, solidifying the market's position as a crucial technology for businesses navigating the complexities of the digital age. The forecast period (2025-2033) anticipates sustained growth, driven by technological advancements such as natural language processing (NLP) and machine learning (ML), enabling more accurate and nuanced sentiment analysis. However, challenges remain, including data privacy concerns, the need for multilingual capabilities, and the complexity of analyzing sarcasm or nuanced language. Addressing these challenges will be crucial for sustained market expansion. The increasing adoption of cloud-based solutions is expected to further fuel market growth due to scalability, cost-effectiveness, and accessibility. The integration of sentiment analysis with other technologies, such as business intelligence and CRM systems, will also contribute significantly to its overall market expansion. We project a continued strong CAGR, reflecting the ongoing demand and technological advancements in the field.
Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media.Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media. We are demonstrating collection, anonymisation and analysis of social media data from consenting participants in the Avon Longitudinal Study of Parents and Children. Initially we are studying Twitter use, and gathering data through the platforms API. Our software gathers social media posts and interactions from participants every few days, with datasets being stored under security ISO 27001 certification. Derived, depersonalised datasets can be made available to approved researchers, and we aim to provide a means to evaluate sentiment analysis methods against ground truth data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Investigating Goodreads reviews to perform sentiment analysis and keyword extraction about popular books.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Sentiment Analysis: The "queue2" model can be used to detect engagement and emotional expressions between people in a given setting. For instance, in scenarios like a business meeting or a social gathering, understanding expressions and body language may provide valuable insights.
Safety Monitoring: The model can be utilized in safety systems such as CCTV monitoring, where identifying people’s interactions in a specific space can help to ensure public safety.
Social Networking: This model can find utility in social network applications to tag friends in a photos based on their poses and interactions.
Behavioral Study: In research fields, this model can help in studying people's behavior in group settings or identifying patterns in social interactions.
Customer Experience Management: In retail or event settings, businesses can use this model for managing crowd, measuring customer satisfaction levels or improvising on customer experiences.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Emotions play a vital role in human communication, and detecting emotions from text data is a challenging task. The ability to automatically recognize emotions from text has many practical applications, such as in sentiment analysis, social media monitoring, and customer feedback analysis.
In this project, we will discuss the working principle of a text emotion recognition model and its important terminologies. We will also provide a detailed description of the model architecture used and its training process. Finally, we will conclude by evaluating the model using confusion matrix and classification report. Here, in the "emotions" column 0: sad 1: happy
slang.txt in Abbreviations step can be taken from: https://www.kaggle.com/datasets/mansis97/slangs
README file
Data Set Title: “PERCEIVE - ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”
Data Set Authors:
Vitaliano Barberio (Wirtschaftsuniversität Wien), ORCID http://orcid.org/0000-0002-2615-5006;
Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;
Data Set Contributors:
Ines Kuric (Wirtschaftsuniversität Wien);
Edoardo Mollona (Università di Bologna), ORCID http://orcid.org/0000-0001-9496-8618.
Markus Höllerer (Wirtschaftsuniversität Wien); http://orcid.org/0000-0003-2509-2696
Data Set Contact Person:
Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;
luca.pareschi@uniroma2.it .
Data Set License: this data set is distributed under a Creative Commons Attribution (CC BY) 4.0 International license
Publication Year: 2021
Project Info: PERCEIVE (Perception and Evaluation of Regional and Cohesion Policies by Europeans and Identification with the Values of Europe), funded by European Union, Horizon 2020 Programme. Grant Agreement num. 693529; https://www.perceiveproject.eu/.
Data set Contents
The data set consists of:
1 README file
6 textual qualitative file saved in .txt format
“stoplist_file_[nation].txt”
12 textual quantitative file saved in .txt format
“[source]-keys.txt”: 6 files
2 excel quantitative files saved in .xlsx format
“SentimentFB.xlsx”
“topics_prevalence_and_clustering.xlsx”
Data set Documentation
Abstract
This data set contains the underlying data of the paper “’ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”.
Data openly available within this dataset are a subset of the two following data sets, which contains all the relevant data of Work Package 3 and Work Package 5 of PERCEIVE project:
Data set: “PERCEIVE: WP3: Effectiveness of communication strategies of EU projects” https://doi.org/10.5281/zenodo.3371133
Data set: “PERCEIVE: WP5: The multiplicity of shared meanings of EU and Cohesion Regional and Urban Policy at different discursive levels” https://doi.org/10.5281/zenodo.3371174
For the paper we collected Facebook posts referred to EU CP policies. We don’t have the permission to share these data (as they are protected by copyright), but all the sources are described in Deliverable 5.2, which is public (see http://doi.org/10.6092/unibo/amsacta/5726 or http://doi.org/10.5281/zenodo.1318184). We analyzed the textual content of data to construct a database of discursive topics in Task5.4. Data set includes the results of topic modeling and of a sentiment analysis performed on the Facebook homepages of Local Management Authorities (LMA) of PERCEIVE case study regions.
Content of the files:
1 sub-folder, named “A_Stopword”, which contains all the stopword lists used for performing Topic Modeling. These are 6 .txt files, one for each language: Austrian, Italian, Polish, Romanian, Spanish, Swedish (“stoplist_file_[nation].txt”).
1 sub-folder which contain the Topic Modeling results for Facebook profiles of the Local Managing Authorities for Austria, Italy, Poland, Romania, Spain, and Sweden (sub-folder “B_Facebook”, 12 .txt files). For each case, a file “[source]-keys.txt” lists the 100 most important words for each topic, while a file “[source]-composition.txt” details the topic composition of each textual source. These files were obtained through Mallet software[1].
File “SentimentFB.xlsx” contains data regarding the sentiment analysis for contents on Facebook homepages of Local Managing Authorities. The first column indicates the country, as well as row labels (see below). Columns 2-21 indicate the number id of the topics for each topic model (national level). The three rightmost columns of the file represent respectively a) the name of the lexicon used to detect sentiment orientation (i.e. “VADER”); c) the average sentiment score for positive, neutral and average words for each lexicon and each country; and c) the sentiment score across all topics in a country.
File “topics_prevalence_and_clustering.xlsx” contains data regarding the three clusters of topics analyzed in the paper. The first column represents the ID of each topic; the second column reports the cluster of each topic; the third and the fourth columns report the average prevalence of each topic (rows) in posts and comments, respectively. As these data refer to a regional case study, these columns refer the first region for each country; the sixth and the seventh columns report the average prevalence of each topic (rows) in posts and comments for the second region analyzed (only for those countries where we analyzed two regions); the eighth and ninth columns reports the average prevalence of topics and comments, respectively, for each country; and finally the tenth column reports the country to which data in the previous two columns are referred.
[1] McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit."http://mallet.cs.umass.edu. 2002.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This is a project on Social Media Sentiment Analysis using Hortonworks Sandbox following the procedure provided at website. The default username and password is root and clickstream respectively. Any BI tool can be used but I recommend Tableau which can be downloaded from website. Any user can contact me at cmdude16@gmail.com for further guidance.