This dataset was created by Rachit Khandelwal
This dataset was created by najib mrh
US Airline passenger satisfaction survey
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains reviews of the top 10 rated airlines in 2023 sourced from the Airline Quality (https://www.airlinequality.com) website. The reviews cover various aspects of the flight experience, including seat comfort, staff service, food and beverages, inflight entertainment, value for money, and overall rating. The dataset is suitable for sentiment analysis, customer satisfaction analysis, and other similar tasks.
Usage - Download the dataset file airlines_reviews.csv. - Use the dataset for analysis, visualization, and machine learning tasks.
List of Airlines 1. Singapore Airlines 2. Qatar Airways 3. All Nippon Airways 4. Emirates 5. Japan Airlines 6. Turkish Airlines 7. Air France 8. Cathay Pacific Airways 9. EVA Air 10.Korean Air
This dataset is provided under the MIT License.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by AmitVerma2030
Released under MIT
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Subodh Kumar
Released under MIT
This dataset was created by Ahmad Noor
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description
- Customer Demographics: Includes FullName, Gender, Age, CreditScore, and MonthlyIncome. These variables provide a demographic snapshot of the customer base, allowing for segmentation and targeted marketing analysis.
- Geographical Data: Comprising Country, State, and City, this section facilitates location-based analytics, market penetration studies, and regional sales performance.
- Product Information: Details like Category, Product, Cost, and Price enable product trend analysis, profitability assessment, and inventory optimization.
- Transactional Data: Captures the customer journey through SessionStart, CartAdditionTime, OrderConfirmation, OrderConfirmationTime, PaymentMethod, and SessionEnd. This rich temporal data can be used for funnel analysis, conversion rate optimization, and customer behavior modeling.
- Post-Purchase Details: With OrderReturn and ReturnReason, analysts can delve into return rate calculations, post-purchase satisfaction, and quality control.
Types of Analysis
- Descriptive Analytics: Understand basic metrics like average monthly income, most common product categories, and typical credit scores.
- Predictive Analytics: Use machine learning to predict credit risk or the likelihood of a purchase based on demographics and session activity.
- Customer Segmentation: Group customers by demographics or purchasing behavior to tailor marketing strategies.
- Geospatial Analysis: Examine sales distribution across different regions and optimize logistics. Time Series Analysis: Study the seasonality of purchases and session activities over time.
- Funnel Analysis: Evaluate the customer journey from session start to order confirmation and identify drop-off points.
- Cohort Analysis: Track customer cohorts over time to understand retention and repeat purchase patterns.
- Market Basket Analysis: Discover product affinities and develop cross-selling strategies.
Curious about how I created the data? Feel free to click here and take a peek! 😉
📊🔍 Good Luck and Happy Analysing 🔍📊
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The columns in the dataset include index, unit id, golden, unit state, trusted judgments, last judgment at, airline sentiment, airline sentiment confidence, negative reason, negative reason confidence, airline_sentiment_gold and retweet count. There is also text included for each tweet as well as tweet location and user timezone.
Using this dataset, you can get a feel for how customers of various airlines feel about their service. You can use the data to analyze trends over time or compare different airlines. Some research ideas include using airline sentiment to predict the stock market or using the negativereason data to help airlines improve their customer service
Looking at this dataset, you can get a feel for how customers of various airlines feel about their service. The data includes the airline, the tweet text, the date of the tweet, and various other information. You can use this to analyze trends over time or compare different airlines
- Using airline sentiment to predict the stock market - is there a correlation between how the public perceives an airline and how that airline's stock performs?
- Using negativereason data to help airlines improve their customer service - which negative reasons are mentioned most often? Are there certain airlines that are consistently mentioned for specific reasons?
- Use the tweet data to map out airline hot spots - where do people tend to tweet about certain airlines the most? Is there a geographic pattern to sentiment about specific airlines?
If you use this dataset in your research, please credit Social Media Data
License
License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) - You are free to: - Share - copy and redistribute the material in any medium or format for non-commercial purposes only. - Adapt - remix, transform, and build upon the material for non-commercial purposes only. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - You may not: - Use the material for commercial purposes.
File: Airline-Sentiment-2-w-AA.csv | Column name | Description | |:---------------------------|:-----------------------------------------------------------------------------| | _golden | This column is the gold standard column. (Boolean) | | _unit_state | This column is the state of the unit. (String) | | _trusted_judgments | This column is the number of trusted judgments. (Numeric) | | _last_judgment_at | This column is the timestamp of the last judgment. (String) | | airline_sentiment | This column is the sentiment of the tweet. (String) | | negativereason | This column is the negative reason for the sentiment. (String) | | airline_sentiment_gold | This column is the gold standard sentiment of the tweet. (String) | | name | This column is the name of the airline. (String) | | negativereason_gold | This column is the gold standard negative reason for the sentiment. (String) | | retweet_count | This column is the number of retweets. (Numeric) | | text | This column is the text of the tweet. (String) | | tweet_coord | This column is the coordinates of the tweet. (String) | | tweet_created | This column is the timestamp of the tweet. (String) | | tweet_location | This column is the location of the tweet. (String) | | user_timezone | This column is the timezone of the user. (String) |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘846 Companies Ranked! (2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/axeltorbenson/846-companies-ranked on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This file contains data on 846 companies and their ranking based on 6 characteristics: customer satisfaction, employee engagement and development, innovation, social responsibility, financial strength, and effectiveness. These rank were made by the Drucker Institute. Obviously these rankings are not and can not be accurate, but are the opinion of and are influenced wholly by the ranking criterium of the Drucker Institute.
Source: https://www.drucker.institute/2021-drucker-institute-company-ranking/
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘TourPackagePrediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sanamps/tourpackageprediction on 28 January 2022.
--- Dataset description provided by original source is as follows ---
You are a Data Scientist for a tourism company named "Lets Travel". The Policy Maker of the company wants to enable and establish a viable business model to expand the customer base. A viable business model is a central concept that helps you to understand the existing ways of doing the business and how to change the ways for the benefit of the tourism sector. One of the ways to expand the customer base is to introduce a new offering of packages. Currently, there are 5 types of packages the company is offering - Basic, Standard, Deluxe, Super Deluxe, King. Looking at the data of the last year, we observed that 18% of the customers purchased the packages. However, the marketing cost was quite high because customers were contacted at random without looking at the available information. The company is now planning to launch a new product i.e. Wellness Tourism Package. Wellness Tourism is defined as Travel that allows the traveler to maintain, enhance or kick-start a healthy lifestyle, and support or increase one's sense of well-being. However, this time company wants to harness the available data of existing and potential customers to make the marketing expenditure more efficient. You as a Data Scientist at "Visit with us" travel company have to analyze the customers' data and information to provide recommendations to the Policy Maker and Marketing Team and also build a model to predict the potential customer who is going to purchase the newly introduced travel package.
To predict which customer is more likely to purchase the newly introduced travel package
Customer details: 1. CustomerID: Unique customer ID 2. ProdTaken: Whether the customer has purchased a package or not (0: No, 1: Yes) 3. Age: Age of customer 4. TypeofContact: How customer was contacted (Company Invited or Self Inquiry) 5. CityTier: City tier depends on the development of a city, population, facilities, and living standards. The categories are ordered i.e. Tier 1 > Tier 2 > Tier 3 6. Occupation: Occupation of customer 7. Gender: Gender of customer 8. NumberOfPersonVisiting: Total number of persons planning to take the trip with the customer 9. PreferredPropertyStar: Preferred hotel property rating by customer 10. MaritalStatus: Marital status of customer 11. NumberOfTrips: Average number of trips in a year by customer 12. Passport: The customer has a passport or not (0: No, 1: Yes) 13. OwnCar: Whether the customers own a car or not (0: No, 1: Yes) 14. NumberOfChildrenVisiting: Total number of children with age less than 5 planning to take the trip with the customer 15. Designation: Designation of the customer in the current organization 16. MonthlyIncome: Gross monthly income of the customer
Customer interaction data: 1. PitchSatisfactionScore: Sales pitch satisfaction score 2. ProductPitched: Product pitched by the salesperson 3. NumberOfFollowups: Total number of follow-ups has been done by the salesperson after the sales pitch 4. DurationOfPitch: Duration of the pitch by a salesperson to the customer
--- Original source retains full ownership of the source dataset ---
This dataset was created by Luke Marcus
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The USA Hotels Dataset from Booking.com is a rich collection of data related to hotels across the United States, extracted from Booking.com. This dataset includes essential information about hotel listings, such as hotel names, locations, prices, star ratings, customer reviews, and amenities offered. It's an ideal resource for researchers, data analysts, and businesses looking to explore the hospitality industry, analyze customer preferences, and understand pricing patterns in the U.S. hotel market.
Access 3 million+ US hotel reviews — submit your request today.
Key Features:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This UK English Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native UK English speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains detailed records of customer interactions handled by a customer service team through various communication channels such as inbound calls, outbound calls, and digital touchpoints. It includes over 85,000 entries with information related to the nature of the issue, product categories, agent details, and customer satisfaction scores (CSAT).
Key features include:
Issue Metadata: Timestamps for when the issue was reported and responded to.
Categorization: High-level and sub-level issue categories for better analysis.
Agent Information: Names, supervisors, managers, shift, and tenure bucket.
Customer Feedback: CSAT scores and free-text customer remarks.
Transactional Data:Order IDs, product categories, item prices, and customer city.
This dataset is ideal for exploratory data analysis (EDA), natural language processing (NLP), time-to-resolution analysis, customer satisfaction prediction, and performance benchmarking of service agents.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff07289aba24685fac1a582143c2f1595%2FIA%20na%20Moda%20A%20Revoluo%20da%20Personalizao%20e%20Recomendao%20de%20Produtos.png?generation=1707941820950377&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F5108af937119a9b311d93039684db884%2FIA%20na%20Moda%20A%20Revoluo%20da%20Personalizao%20e%20Recomendao%20de%20Produtos%20(1).png?generation=1707941829090831&alt=media" alt="">
an era where e-commerce is booming, the ability to understand and optimize customer experience is paramount for businesses aiming to thrive. An international e-commerce company, specializing in electronic products, has embarked on an ambitious project to delve deep into their customer database to uncover vital insights that could revolutionize their operations. Leveraging advanced machine learning techniques, the company aims to dissect the complex dynamics of customer interactions and product shipments to enhance satisfaction and efficiency.
The foundation of this analytical venture is a robust dataset comprising 10,999 observations across 12 meticulously curated variables. These variables provide a comprehensive overview of the customer journey, from the initial purchase to the final delivery. Key data points include:
ID: A unique identifier for each customer, ensuring precise tracking and personalized insights. Warehouse Block: With the company's expansive warehouse segmented into blocks A through E, this variable helps in logistics optimization and inventory management. Mode of Shipment: Understanding the impact of different shipment methods (Ship, Flight, Road) on customer satisfaction and delivery efficiency. Customer Care Calls: The frequency of customer inquiries serves as an indicator of service quality and customer engagement. Customer Rating: A direct measure of customer satisfaction, with ratings ranging from 1 (lowest) to 5 (highest). Cost of the Product: This financial metric is crucial for pricing strategies and profitability analysis. Prior Purchases: Tracking customers' purchase history aids in predicting future buying behavior and personalizing marketing efforts. Product Importance: Categorizing products based on their importance (low, medium, high) enables tailored handling and prioritization. Gender: Analyzing shopping patterns and preferences across genders. Discount Offered: Examining the impact of discounts on sales volume and customer acquisition. Weight in Grams: The logistical aspect of shipping, influencing costs and delivery methods. Reached on Time: The critical outcome variable indicating whether a product was delivered within the expected timeframe, serving as a benchmark for operational efficiency. The company acknowledges the contribution of the broader data science community by making this dataset publicly available on GitHub, fostering collaborative research and innovation in customer analytics. This initiative is not just about understanding past performances but is aimed at inspiring data-driven strategies that can address pressing questions such as the correlation between customer ratings and on-time deliveries, the effectiveness of customer support, and the influence of product importance on customer satisfaction and delivery success.
This exploratory journey through data is poised to offer actionable insights that could lead to enhanced product shipment tracking, improved customer satisfaction, and ultimately, a competitive edge in the fast-paced world of e-commerce.
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.
Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The data will be used to predict whether a customer of the bank will churn. If a customer churns, it means they left the bank and took their business elsewhere. If you can predict which customers are likely to churn, you can take measures to retain them before they do. These measures could be promotions, discounts, or other incentives to boost customer satisfaction and, therefore, retention.
The dataset contains:
10,000 rows – each row is a unique customer of the bank
14 columns:
RowNumber: Row numbers from 1 to 10,000
CustomerId: Customer’s unique ID assigned by bank
Surname: Customer’s last name
CreditScore: Customer’s credit score. This number can range from 300 to 850.
Geography: Customer’s country of residence
Gender: Categorical indicator
Age: Customer’s age (years)
Tenure: Number of years customer has been with bank
Balance: Customer’s bank balance (Euros)
NumOfProducts: Number of products the customer has with the bank
HasCrCard: Indicates whether the customer has a credit card with the bank
IsActiveMember: Indicates whether the customer is considered active
EstimatedSalary: Customer’s estimated annual salary (Euros)
Exited: Indicates whether the customer churned (left the bank)
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset titled "Online Delivery Data" comprises 388 entries, each representing an individual's response to a survey concerning their preferences and experiences with online food delivery services in Australia. The dataset is structured into 53 columns, encompassing a wide range of information from demographic details to specific preferences and feedback on online food delivery services. Below is an in-depth description of its structure and the types of information it contains.
Dataset Overview Entries: 388 Attributes: 53 Core Attributes Description Demographic and Background Information
Age: The respondent's age. Gender: The gender of the respondent. Marital Status: Marital status of the respondent (e.g., Single, Married). Occupation: The respondent's occupation. Monthly Income: Monthly income category of the respondent. Educational Qualifications: Educational level achieved by the respondent. City: The city in Australia where the respondent resides. Family size: Number of members in the respondent's family. Service Utilization Preferences
Medium of ordering (P1 and P2): Primary and secondary preferences for ordering mediums, such as food delivery apps or direct calls. Meal preference (P1 and P2): Primary and secondary meal preferences. Preference reasons (P1 and P2): Primary and secondary reasons for their preferences. Perceptions and Attitudes
Various columns capture the respondent's attitudes towards ease and convenience, time-saving aspects, variety of choices, payment options, discounts and offers, food quality, tracking system, and several other factors related to online food delivery. Health and Hygiene Concerns
Specific concerns regarding health, delivery punctuality, hygiene, and past negative experiences with online food delivery services. Service Quality and Feedback
Attributes covering delivery time importance, packaging quality, customer service aspects (such as the number of calls to service and politeness), food freshness, temperature, taste, and quantity. Output: Likely a binary response (e.g., Yes or No) to a specific survey question, which could pertain to the respondent's overall satisfaction or willingness to recommend the service. Reviews: Open-ended feedback from respondents, providing qualitative insights into their experiences. Summary This dataset provides a comprehensive view of consumer preferences, behaviors, and satisfaction levels regarding online food delivery services in Australia. It encompasses a broad spectrum of variables from basic demographic information to detailed opinions on service quality, making it an invaluable resource for analyzing consumer trends, identifying areas for improvement in service delivery, and understanding the factors that influence customer satisfaction and loyalty in the online food delivery industry.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset comprises a comprehensive collection of passenger reviews, accompanied by corresponding review categories, offering a holistic view of airline travel experiences. These reviews provide valuable insights into customer satisfaction, service quality, and sentiment analysis, enabling in-depth analysis and informed decision-making within the airline industry. With the combination of review content and categories, this dataset serves as a valuable resource for understanding and enhancing the passenger journey.
This dataset was created by Rachit Khandelwal