http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.
Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.
"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.
I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."
This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'
The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot
I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
Problem Statement
👉 Download the case studies here
A global consumer goods company struggled to understand customer sentiment across various social media platforms. With millions of posts, reviews, and comments generated daily, manually tracking and analyzing public opinion was inefficient. The company needed an automated solution to monitor brand perception, address negative feedback promptly, and leverage insights for marketing strategies.
Challenge
Analyzing social media sentiment posed the following challenges:
Processing vast amounts of unstructured text data from multiple platforms like Twitter, Facebook, and Instagram.
Accurately interpreting slang, emojis, and nuanced language used by social media users.
Identifying trends and actionable insights in real-time to respond to potential crises or opportunities effectively.
Solution Provided
An advanced sentiment analysis system was developed using Natural Language Processing (NLP) and sentiment analysis algorithms. The solution was designed to:
Classify social media posts into positive, negative, and neutral sentiments.
Extract key topics and trends related to the brand and its products.
Provide real-time dashboards for monitoring customer sentiment and identifying areas of improvement.
Development Steps
Data Collection
Aggregated data from major social media platforms using APIs, focusing on brand mentions, hashtags, and product keywords.
Preprocessing
Cleaned and normalized text data, including handling slang, emojis, and misspellings, to prepare it for analysis.
Model Training
Trained NLP models for sentiment classification using supervised learning. Implemented topic modeling algorithms to identify recurring themes and discussions.
Validation
Tested the sentiment analysis models on labeled datasets to ensure high accuracy and relevance in classifying social media posts.
Deployment
Integrated the sentiment analysis system with a real-time analytics dashboard, enabling the marketing and customer support teams to track trends and respond proactively.
Monitoring & Improvement
Established a continuous feedback mechanism to refine models based on evolving language patterns and new social media trends.
Results
Gained Actionable Insights
The system provided detailed insights into customer opinions, helping the company identify strengths and areas for improvement.
Improved Brand Reputation Management
Real-time monitoring enabled swift responses to negative feedback, mitigating potential reputation risks.
Informed Marketing Strategies
Insights from sentiment analysis guided targeted marketing campaigns, resulting in higher engagement and ROI.
Enhanced Customer Relationships
Proactive engagement with customers based on sentiment analysis improved customer satisfaction and loyalty.
Scalable Monitoring Solution
The system scaled efficiently to analyze data across multiple languages and platforms, broadening the company’s reach and understanding.
This dataset was created by Ujjwal Chowdhury
Product Review Datasets: Uncover user sentiment
Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.
Data sources:
Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:
Choose from multiple data delivery options to suit your needs:
Why choose Oxylabs?
Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.
Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.
Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.
Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.
The data has 67K+ reviews in Arabic for sentiment analysis Data collecting using web scraping for many companies Like ( talabat,kabiter,nasla,swifil,alsiwidiu,kilubatra,dumati,.........etc)
Coulnms
Reviews : review description rating : 1 postive , 0 neutral , -1 negative Company : continues company name for each review
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Karan Singh
Released under Apache 2.0
WiserBrand's Comprehensive Customer Call Transcription Dataset: Tailored Insights
WiserBrand offers a customizable dataset comprising transcribed customer call records, meticulously tailored to your specific requirements. This extensive dataset includes:
User ID and Firm Name: Identify and categorize calls by unique user IDs and company names. Call Duration: Analyze engagement levels through call lengths. Geographical Information: Detailed data on city, state, and country for regional analysis. Call Timing: Track peak interaction times with precise timestamps. Call Reason and Group: Categorised reasons for calls, helping to identify common customer issues. Device and OS Types: Information on the devices and operating systems used for technical support analysis. Transcriptions: Full-text transcriptions of each call, enabling sentiment analysis, keyword extraction, and detailed interaction reviews.
Our dataset is designed for businesses aiming to enhance customer service strategies, develop targeted marketing campaigns, and improve product support systems. Gain actionable insights into customer needs and behavior patterns with this comprehensive collection, particularly useful for Consumer Data, Consumer Behavior Data, Consumer Sentiment Data, Consumer Review Data, AI Training Data, Textual Data, and Transcription Data applications.
WiserBrand's dataset is essential for companies looking to leverage Consumer Data and B2B Marketing Data to drive their strategic initiatives in the English-speaking markets of the USA, UK, and Australia. By accessing this rich dataset, businesses can uncover trends and insights critical for improving customer engagement and satisfaction.
Cases:
Enriching STT Models: The dataset includes a wide variety of real-world customer service calls with diverse accents, tones, and terminologies. This makes it highly valuable for training speech-to-text models to better recognize different dialects, regional speech patterns, and industry-specific jargon. It could help improve accuracy in transcribing conversations in customer service, sales, or technical support.
Contextualized Speech Recognition: Given the contextual information (e.g., reasons for calls, call categories, etc.), it can help models differentiate between various types of conversations (technical support vs. sales queries), which would improve the model’s ability to transcribe in a more contextually relevant manner.
Improving TTS Systems: The transcriptions, along with their associated metadata (such as call duration, timing, and call reason), can aid in training Text-to-Speech models that mimic natural conversation patterns, including pauses, tone variation, and proper intonation. This is especially beneficial for developing conversational agents that sound more natural and human-like in their responses.
Noise and Speech Quality Handling: Real-world customer service calls often contain background noise, overlapping speech, and interruptions, which are crucial elements for training speech models to handle real-life scenarios more effectively.
Customer Interaction Simulation: The transcriptions provide a comprehensive view of real customer interactions, including common queries, complaints, and support requests. By training AI models on this data, businesses can equip their virtual agents with the ability to understand customer concerns, follow up on issues, and provide meaningful solutions, all while mimicking human-like conversational flow.
Sentiment Analysis and Emotional Intelligence: The full-text transcriptions, along with associated call metadata (e.g., reason for the call, call duration, and geographical data), allow for sentiment analysis, enabling AI agents to gauge the emotional tone of customers. This helps the agents respond appropriately, whether it’s providing reassurance during frustrating technical issues or offering solutions in a polite, empathetic manner. Such capabilities are essential for improving customer satisfaction in automated systems.
Customizable Dialogue Systems: The dataset allows for categorizing and identifying recurring call patterns and issues. This means AI agents can be trained to recognize the types of queries that come up frequently, allowing them to automate routine tasks such as ...
"This dataset includes consumer-submitted reviews from over 160 industries, covering both product- and service-based businesses. It’s built to support CX, AI, and analytics teams seeking structured insight into what real customers say, feel, and expect — across sectors like finance, healthcare, travel, telecom, retail, and more.
Each review includes:
The list may vary based on the industry and can be customized as per your request.
Use this dataset to:
This dataset offers flexibility for custom delivery-by industry, domain, or company, making it ideal for teams needing scalable consumer voice data tailored to specific strategic goals."
https://sentalyse.com/en/termshttps://sentalyse.com/en/terms
Downloadable company sentiment dataset over time for Apache Corporation, based on trusted financial news sources.
FABSA, An aspect-based sentiment analysis dataset in the Customer Feedback space (Trustpilot, Google Play and Apple Store reviews).
A professionally annotated dataset released by Chattermill AI, with 8 years of experience in leveraging advanced ML analytics in the customer feedback space for high-profile clients such as Amazon and Uber. Two annotators possess extensive experience in developing human-labeled ABSA datasets for commercial companies, while the third annotator holds a… See the full description on the dataset page: https://huggingface.co/datasets/jordiclive/FABSA.
https://brightdata.com/licensehttps://brightdata.com/license
Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.
Key Features:
Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
Use Cases:
Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
https://brightdata.com/licensehttps://brightdata.com/license
The Google Reviews dataset is perfect for obtaining comprehensive insights into businesses and their customer feedback globally. Easily filter by location, business type, or reviewer details to extract the precise data you need. The Google Reviews dataset includes key data points such as URL, place ID, place name, country, address, review ID, reviewer name, total reviews and photos by the reviewer, reviewer profile URL, and more. This dataset provides valuable information for sentiment analysis, business comparisons, and customer behavior studies.
WiserBrand offers a unique dataset of real consumer-to-business phone conversations. These high-quality audio recordings capture authentic interactions between consumers and support agents across industries. Unlike synthetic data or scripted samples, our dataset reflects natural speech patterns, emotion, intent, and real-world phrasing — making it ideal for:
Training ASR (Automatic Speech Recognition) systems
Improving voice assistants and LLM audio understanding
Enhancing call center AI tools (e.g., sentiment analysis, intent detection)
Benchmarking conversational AI performance with real-world noise and context
We ensure strict data privacy: all personally identifiable information (PII) is removed before delivery. Recordings are produced on demand and can be tailored by vertical (e.g., telecom, finance, e-commerce) or use case.
Whether you're building next-gen voice technology or need realistic conversational datasets to test models, this dataset provides what synthetic corpora lack — realism, variation, and authenticity.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created by lilin0079
Released under CC BY-NC-SA 4.0
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction • Twitter-tweets-sentiment dataset is a dataset that aims to analyze tweet sentiment for Twitter and natural language processing.
2) Data utilization (1)Twitter-tweets-sentiment data has characteristics that: • The data consists of three columns, including emotion and text, and aims to block negative tweets through a powerful classification model. (2) Twitter-tweets-sentiment data can be used to: • Social Media Monitoring: Businesses and organizations can use data to monitor social media platforms and gauge public sentiment about a brand, product, event, or social issue. • Sentiment analysis: This dataset can be used to train models that classify the sentiment of tweets, which can help companies and researchers understand public opinion on a variety of topics.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction • Emotion-analysis dataset is data for analyzing the emotions of text.
2) Data utilization (1) Emotion-analysis data has characteristics that: • Contains a variety of texts that convey emotions ranging from happiness to anger to sadness. The goal is to build an efficient model for detecting emotions in text. (2) Emotion-analysis data can be used to: • Sentiment classification models: This dataset can be used to train machine learning models that classify text based on sentiment, which helps companies and researchers understand public opinion and sentiment trends. • Market research: Researchers can analyze sentiment data to understand consumer preferences and market trends and support data-driven decision making.
https://qlsolutions.synology.me/en/termshttps://qlsolutions.synology.me/en/terms
Downloadable company sentiment dataset over time for Splunk Inc., based on trusted financial news sources.
https://qlsolutions.synology.me/en/termshttps://qlsolutions.synology.me/en/terms
Downloadable company sentiment dataset over time for The Walt Disney Company, based on trusted financial news sources.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by rahul93102
Released under MIT
Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.
What Makes Our Data Unique?
Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.
Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.
Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.
Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.
How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.
Primary Use Cases and Verticals
Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.
Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.
B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.
HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.
How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.
Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.
Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.
Contact us for sample datasets or to discuss your specific needs.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.
Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.
"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.
I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."
This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'
The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot
I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)