Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a synthetic dataset generated to mimic real-world e-commerce return management scenarios. Since actual return data is often confidential and unavailable, this dataset was created with realistic assumptions around orders, products, customers, and return behaviors.
It can be used for:
Predictive modeling of return likelihood (classification problems).
Business analytics on profitability loss due to returns.
Sustainability analysis (CO₂ emissions and waste impact from reverse logistics).
📌 Dataset Features (Columns)
Order_ID → Unique order identifier.
Product_ID → Unique product identifier.
User_ID → Unique customer identifier.
Order_Date → Date when the order was placed.
Return_Date → Date when the product was returned (if returned).
Product_Category → Category of the product (e.g., Clothing, Electronics, Books, Toys, etc.).
Product_Price → Price of the product per unit.
Order_Quantity → Number of units purchased in the order.
Discount_Applied → Discount percentage applied on the product.
Return_Status → Whether the order was Returned or Not Returned.
Return_Reason → Reason for return (e.g., Damaged, Wrong Item, Changed Mind).
Days_to_Return → Number of days taken by customer to return (0 if not returned).
User_Age → Age of the customer.
User_Gender → Gender of the customer (Male/Female).
User_Location → City/region of the customer.
Payment_Method → Mode of payment (Credit Card, Debit Card, PayPal, Gift Card, etc.).
Shipping_Method → Chosen shipping type (Standard, Express, Next-Day).
Return_Cost → Estimated logistics cost incurred when a return happens.
Profit_Loss → Net profit or loss for the order, considering product price, discount, and return cost.
CO2_Saved → Estimated CO₂ emissions saved (if return avoided).
Waste_Avoided → Estimated physical waste avoided (in units/items).
💡 Use Cases
MBA & academic projects in Business Analytics and Supply Chain Management.
Training predictive models for return forecasting.
Measuring sustainability KPIs (CO₂ reduction, waste avoidance).
Dashboards in Power BI/Tableau for business decision-making.
Quick Start Example:
import pandas as pd
df = pd.read_csv("/kaggle/input/synthetic-ecommerce-returns/returns_sustainability_dataset.csv")
print(df.head())
print(df.info())
print(df['Return_Status'].value_counts(normalize=True))
category_returns = df.groupby('Product_Category')['Return_Status'].mean().sort_values(ascending=False) print(category_returns)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 1,100 synthetic job descriptions (JDs) spanning 55 diverse roles, designed to facilitate career guidance, resume building, ATS (Applicant Tracking System) simulation, and research in NLP/ML.
All job descriptions are synthetically generated based on curated references from publicly available job postings, career guides, and professional role descriptions. They are not real job postings but represent realistic expectations, responsibilities, and skills for each role.
Tech Roles (Core, Popular, and Niche)
Non-Tech Roles (Business, Creative, Operations, and Niche)
| Field | Description |
|---|---|
| JobID | Unique identifier for each job description |
| Title | Job role/title |
| ExperienceLevel | Fresher / Junior / Experienced / Lead / Senior |
| YearsOfExperience | Numeric range or years (e.g., 0-1, 3-5) |
| Skills | List of required skills (JSON array or semicolon-separated in CSV) |
| Responsibilities | Key responsibilities (JSON array or semicolon-separated in CSV) |
| Keywords | Role-specific focus areas (JSON array or semicolon-separated in CSV) |
job_dataset.json – structured array of job objects.job_dataset.csv – arrays flattened with semicolons for easy viewing in Excel or Pandas.Synthetic Job Descriptions Dataset (2025) – Curated & Generated by Aditya Raj Srivastava (https://www.kaggle.com/adityarajsrv)
Facebook
TwitterBy Crawl Feeds [source]
GameStop Product Reviews Dataset
Comprehensive and Detailed Customer Reviews and Ratings of Products from GameStop
Data Overview:
This dataset comprises a rich variety of information centered on customer reviews and ratings for products purchased from GameStop. For each review, the data includes detailed aspects such as the product name, brand, SKU (Stock Keeping Unit), helpful and non-helpful votes count, reviewer's name along with their review title & description. Further insights can be found through additional features that outline whether or not the reviewer recommends the product, whether they are a verified purchaser and encompass individual & average ratings for each product.
Other significant facets encapsulated within this valuable resource involve multimedia elements like images posted in reviews. To verify temporal relevance, timestamps revealing when the review was written (reviewed_at) as well as when the data was collected (scraped_at) are provided.
Additionally, URLs related to both specific items up for purchase (url) at GameStop's site and other users' reviews pages (reviews_link) have been accumulated within. The total number of customer feedback posts per item is also available under reviews_count series.
Structure:
The dataset structure presents serialized versions of the afore-mentioned fields. This includes strings such as 'name', 'brand', 'review_title', etc; date times including 'reviewed_at' and 'scraped_at'; floating point numbers such as 'rating' & 'average_rating'; integers representing counts ('helpful_count','not_helpful_count' ); boolean flags determining reviewers recommendations or verified purchase status ('recommended_review','verifed_purchaser') along with some potential null entries spotted across several columns making it dynamic yet intuitive even to an unfamiliar eye.
Use Case:
This dataset can serve multiple functions depending largely on user requirements.There are intriguing prospects around tracking consumer sentiments across time periods which could lend fascinating insights into sales patterns.Another possibility might revolve around determining best selling items or brands on GameStop according to customer impressions and sales counts. Additionally, there is potential to link buying trends with whether the product was purchased legitimately or not.
This dataset could also be used by product managers to enhance existing ones or create improved versions of them taking into account customer suggestions from their review content.Finaly, marketing teams could use this dataset to strategize campaigns by identifying products with positive reviews & scaling promotions for those.
Of course,the versatility of this resource opens up vast domains, ranging from sentiment analysis and recommendation systems using machine learning methodologies,to data visualization projects that help demonstrate consumer trends in a more approachable
Sentiment Analysis: Use the 'review_description' field to understand customer sentiment towards specific products. NLP techniques can be deployed to derive sentiments from reviews text, which could help in understanding overall consumer opinion.
Brand Analysis: Use the 'brand' field for comparative analysis of various brands sold on GameStop's platform.
Product Recommendation System: Develop a product recommendation system based on the user's past purchase record represented by 'brand', 'sku', and past reviews.
Customer Segmentation: Analyse fields like 'rating', 'recommended_review', and 'verifed_purchaser' for advanced segmentation of customers.
Product Performance Analysis: By examining fields like average rating (average_rating), number of reviews (reviews_count), recommend status (recommended_review), one can gauge how well a product is performing or received by customers.
Review Popularity Analysis: The dataset features two interesting variables - helpful_count and not_helpful_count; these reflect how other users perceived a review’s usefulness in helping them make purchasing decisions.
7 .**Time Series Forecasting**: Although we're instructed not to include any dates here, don't forget that this dataset has temporal elements ('reviewed_at') you could use for forecasting trends over time!
8 .**Reviewer Trustworthiness Assessment**: The verified purchaser field can be used as an indicator for trustworthiness of the review or reviewer bias.
P...
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Egg Image Dataset is constructed by collecting images of eggs captured in real-world environments, classified based on whether the eggs are damaged or not damaged.
2) Data Utilization (1) Characteristics of the Egg Image Dataset: • It includes images collected from various real-world settings such as kitchens, farms, and markets, making it highly effective for model training and improving data generalization. • The dataset provides a clear distinction between damaged and undamaged eggs, making it suitable for solving problems related to object recognition and quality inspection.
(2) Applications of the Egg Image Dataset: • Development of Object Recognition and Quality Classification Models: It can be used to train AI models to automatically detect and classify eggs based on their damage status. • Utilization in Research and Development (R&D): The dataset can be applied to various R&D projects, including product quality management and the development of automated inspection systems.
Facebook
TwitterSuccess.ai’s Email Address Data for IT Companies in Europe provides a comprehensive dataset tailored for businesses targeting the European IT industry. With access to verified work emails, firmographic insights, and detailed employee data, this dataset is ideal for sales teams, marketers, and recruiters seeking to connect with decision-makers across Europe’s IT landscape.
Sourced from over 170 million verified professional profiles and 30 million company profiles, Success.ai ensures your outreach and strategic initiatives are driven by reliable, continuously updated, and AI-validated data, all offered at an unbeatable price.
Why Choose Success.ai’s Email Address Data for IT Companies?
Verified Work Emails for Precision Outreach
Regional Coverage of Europe’s IT Sector
Continuously Updated Datasets
Ethical and Compliant
Data Highlights:
Key Features of the Dataset:
Decision-Maker Profiles in IT
Advanced Filters for Tailored Campaigns
AI-Driven Enrichment
Strategic Use Cases:
Sales and Lead Generation
Marketing and Demand Generation
Recruitment and Talent Acquisition
Market Research and Technology Trends
Why Choose Success.ai?
Best Price Guarantee
Seamless Integration
Data Accuracy with AI Validation
Customizable and Scalable Solutions
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Kaggle released the results of its third annual Machine Learning and Data Science Survey. Then I thought it would be nice to compare Kagglers skills to job listings around the world. This was the motivation to build this dataset.
To compile this dataset I have scraped Glassdoor listings looking for the following search terms, for every country:
* data-scientist
* software-engineer
* data-analyst
* research-scientist
* business-analyst
* product-manager
* project-manager
* data-engineer
* statistician
* dba
* database-engineer
* machine-learning-engineer
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1549225%2F645daae16e296cf5739aa78d888f3e4e%2FScreenshot%202019-12-01%2023.38.43.png?generation=1575243542411736&alt=media" alt="">
The web scraping was done on the 10th of December 2019, and I was able to retrieve more than 165k job listings.
The script that was use to get public data from glassdoor is available at this GitHub Repository, there you'll also find more information about the data treatment done prior to uploading it to Kaggle.
I don't have any connection with Glassdoor and this project is neither approved or endorsed by them. The data collected, and made available here was publicly accessible (without even logging in to the website) at the moment it was collected. This dataset was created for educational purposes.
Facebook
TwitterSuccess.ai’s B2B Leads Data for Architecture, Planning, and Design Experts in Europe provides verified access to professionals shaping the built environment. Leveraging over 700 million LinkedIn profiles, this dataset delivers actionable insights, verified contact details, and firmographic data for architects, urban planners, interior designers, and more. Whether your objective is to market products, recruit talent, or explore industry trends, Success.ai ensures your data is accurate, enriched, and continuously updated.
Why Choose Success.ai’s B2B Leads Data for Architecture, Planning & Design Experts? Comprehensive Professional Profiles
Access verified profiles of architects, urban planners, landscape designers, and project managers in Europe. AI-driven validation ensures 99% accuracy, optimizing outreach efforts and minimizing bounce rates. Focused Coverage Across Europe
Includes professionals from major architectural firms, design studios, and urban planning organizations. Covers key markets like the UK, Germany, France, Italy, and Scandinavia. Continuously Updated Dataset
Real-time updates ensure your data remains relevant, reflecting changes in roles, organizations, and professional achievements. Tailored for Architectural Insights
Enriched profiles include professional histories, areas of specialization, certifications, and firmographic details for a deeper understanding of your audience. Data Highlights: 700M+ Verified LinkedIn Profiles: Gain access to a global network of architecture and design professionals. 170M+ Enriched Profiles: Includes work emails, phone numbers, and decision-maker insights for targeted communication. Industry-Specific Segmentation: Target professionals in architecture, urban planning, interior design, and landscape architecture with precision filters. Region-Specific Data: Focus on European design hubs, including London, Paris, Berlin, and Copenhagen. Key Features of the Dataset: Architecture and Design Professional Profiles
Identify and connect with architects, project managers, urban planners, and design experts leading major projects. Engage with professionals driving trends in sustainable building, smart cities, and innovative design. Detailed Firmographic Data
Leverage insights into company sizes, project scales, geographic reach, and specialization areas. Customize your approach to align with the needs of architectural firms, urban planning agencies, or independent designers. Advanced Filters for Precision Targeting
Refine searches by region, design specialty (residential, commercial, landscape), or years of experience. Tailor campaigns to address industry challenges such as sustainability, urbanization, or heritage conservation. AI-Driven Enrichment
Enhanced datasets provide actionable details for personalized campaigns, highlighting certifications, awards, and key projects. Strategic Use Cases: Marketing Products and Services
Promote building materials, design software, or urban planning tools to architects, designers, and planners. Engage with professionals managing construction, sustainability initiatives, or smart city developments. Collaboration and Partnerships
Identify architects, urban planners, and design studios for collaborative projects, competitions, or design innovations. Build partnerships with firms focused on sustainability, green architecture, and cutting-edge urban design. Recruitment and Talent Acquisition
Target HR professionals and architectural firms seeking designers, project managers, and urban planning specialists. Simplify hiring for roles requiring creative and technical expertise. Market Research and Trend Analysis
Analyze shifts in urban development, design trends, and sustainable construction practices across Europe. Use insights to refine product development and marketing strategies tailored to the architectural sector. Why Choose Success.ai? Best Price Guarantee
Access industry-leading B2B Leads Data at unmatched pricing, ensuring cost-effective campaigns and strategies. Seamless Integration
Easily integrate verified architectural data into CRMs, recruitment platforms, or marketing systems using APIs or downloadable formats. AI-Validated Accuracy
Depend on 99% accurate data to minimize wasted efforts and maximize engagement with architecture and design professionals. Customizable Solutions
Tailor datasets to specific architectural segments, regions, or roles to meet your strategic objectives. Strategic APIs for Enhanced Campaigns: Data Enrichment API
Enhance existing records with verified profiles of architectural and design professionals to refine targeting and engagement. Lead Generation API
Automate lead generation for a consistent pipeline of qualified professionals, scaling your outreach efficiently. Success.ai’s B2B Leads Data for Architecture, Planning & Design Experts positions you to connect with the creative minds shaping Europe’s...
Facebook
TwitterSuccess.ai’s User Profiles Data for Nonprofit and NGO Leaders provides businesses, organizations, and researchers with comprehensive access to global leaders in the nonprofit and NGO sectors. With data sourced from over 700 million verified LinkedIn profiles, this dataset includes actionable insights and contact details for executives, program managers, administrators, and decision-makers. Whether your goal is to partner with nonprofits, support global causes, or conduct research into social impact, Success.ai ensures your outreach is backed by accurate, enriched, and continuously updated data.
Why Choose Success.ai’s User Profiles Data for Nonprofit and NGO Leaders? Comprehensive Professional Profiles
Access verified LinkedIn profiles of nonprofit leaders, NGO managers, program directors, grant writers, and administrative executives. AI-driven validation ensures 99% accuracy for efficient communication and minimized bounce rates. Global Coverage Across Nonprofit Sectors
Includes profiles from nonprofits, humanitarian organizations, environmental groups, social enterprises, and advocacy organizations. Covers key markets across North America, Europe, APAC, South America, and Africa for global reach. Continuously Updated Dataset
Reflects real-time professional updates, organizational changes, and emerging trends in the nonprofit landscape to keep your targeting relevant and effective. Tailored for Nonprofit Insights
Enriched profiles include work histories, organizational affiliations, areas of expertise, and social impact projects for deeper engagement opportunities. Data Highlights: 700M+ Verified LinkedIn Profiles: Access a vast network of nonprofit and NGO professionals worldwide. 100M+ Work Emails: Direct communication with executives, managers, and decision-makers in the nonprofit sector. Enriched Organizational Data: Gain insights into leadership structures, mission focuses, and operational scales. Industry-Specific Segmentation: Target nonprofits focused on healthcare, education, environmental sustainability, human rights, and more. Key Features of the Dataset: Nonprofit and NGO Leader Profiles
Identify and connect with executives, program managers, fundraisers, and policy directors in global nonprofit and NGO sectors. Engage with individuals who drive decision-making and operational strategies for impactful organizations. Detailed Organizational Insights
Leverage firmographic data, including organizational size, mission, regional activity, and funding sources, to align with specific nonprofit goals. Advanced Filters for Precision Targeting
Refine searches by region, mission type, role, or organizational focus for tailored outreach. Customize campaigns based on social impact priorities, such as climate action, gender equality, or economic development. AI-Driven Enrichment
Enhanced datasets provide actionable insights into professional accomplishments, partnerships, and leadership achievements for targeted engagement. Strategic Use Cases: Partnership Development and Outreach
Identify nonprofits and NGOs for collaboration on social impact projects, sponsorships, or grant distribution. Build relationships with decision-makers driving advocacy, fundraising, and community initiatives. Donor Engagement and Fundraising
Target nonprofit leaders responsible for managing fundraising campaigns and donor relationships. Tailor outreach efforts to align with specific causes and funding priorities. Research and Analysis
Analyze leadership trends, mission focuses, and regional nonprofit activities to inform program design and funding strategies. Use insights to evaluate the effectiveness of social impact initiatives and partnerships. Recruitment and Talent Acquisition
Target HR professionals and administrators seeking qualified staff, consultants, or volunteers for nonprofits and NGOs. Offer talent solutions for specialized roles in program management, advocacy, and administration. Why Choose Success.ai? Best Price Guarantee
Access industry-leading, verified User Profiles Data at unmatched pricing to ensure your campaigns are cost-effective and impactful. Seamless Integration
Easily integrate verified nonprofit data into your CRM or marketing platforms with APIs or downloadable formats. AI-Validated Accuracy
Rely on 99% accuracy to minimize wasted outreach efforts and maximize engagement outcomes. Customizable Solutions
Tailor datasets to focus on specific nonprofit types, geographical regions, or areas of social impact to meet your strategic objectives. Strategic APIs for Enhanced Campaigns: Data Enrichment API
Update your internal records with verified nonprofit leader profiles to enhance targeting and engagement. Lead Generation API
Automate lead generation for a consistent pipeline of nonprofit and NGO professionals, scaling your outreach efforts efficiently. Success.ai’s User Profiles Data for Nonprofit and NGO Leader...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides synthetic yet realistic data for analyzing and forecasting retail store inventory demand. It contains over 73000 rows of daily data across multiple stores and products, including attributes like sales, inventory levels, pricing, weather, promotions, and holidays.
The dataset is ideal for practicing machine learning tasks such as demand forecasting, dynamic pricing, and inventory optimization. It allows data scientists to explore time series forecasting techniques, study the impact of external factors like weather and holidays on sales, and build advanced models to optimize supply chain performance.
Challenge 1: Time Series Demand Forecasting Predict daily product demand across stores using historical sales and inventory data. Can you build an LSTM-based forecasting model that outperforms classical methods like ARIMA?
Challenge 2: Inventory Optimization Optimize inventory levels by analyzing sales trends and minimizing stockouts while reducing overstock situations.
Challenge 3: Dynamic Pricing Develop a pricing strategy based on demand, competitor pricing, and discounts to maximize revenue.
Date: Daily records from [start_date] to [end_date]. Store ID & Product ID: Unique identifiers for stores and products. Category: Product categories like Electronics, Clothing, Groceries, etc. Region: Geographic region of the store. Inventory Level: Stock available at the beginning of the day. Units Sold: Units sold during the day. Demand Forecast: Predicted demand based on past trends. Weather Condition: Daily weather impacting sales. Holiday/Promotion: Indicators for holidays or promotions.
Exploratory Data Analysis (EDA): Analyze sales trends, visualize data, and identify patterns. Time Series Forecasting: Train models like ARIMA, Prophet, or LSTM to predict future demand. Pricing Analysis: Study how discounts and competitor pricing affect sales.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🔍 Total Sales: Achieved $456,000 in revenue across 1,000 transactions, with an average transaction value of $456.00.
👥 Customer Demographics:
Average Age: 41.39 years Gender Distribution: 51% male, 49% female Most active age groups: 31-40 & 41-50 years 🏷️ Product Performance:
Top Categories: Electronics and Clothing led the sales, each contributing $160,000, followed by Beauty products with $140,000. Quantity Sold: Clothing topped the charts with 894 units sold. 📈 Sales Trends: Identified key sales peaks, especially in May 2023, indicating the success of targeted promotional strategies.
Why This Matters:
Understanding these metrics allows for better-targeted marketing, efficient inventory management, and strategic planning to capitalize on peak sales periods. This project demonstrates the power of data-driven decision-making in retail!
💡 Takeaway: Power BI continues to be a game-changer in visualizing and interpreting complex data, helping businesses to not just see numbers but to translate them into actionable insights.
I’m always looking forward to new challenges and projects that push my skills further. If you're interested in diving into the details or discussing data insights, feel free to reach out!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides structured inventory management data, covering key aspects such as product, orders, customer, employee, warehouse, region, and order details. It is designed for SQL-based analysis, business intelligence, and inventory optimization.
Facebook
TwitterThe dataset is a comprehensive sales record from Gottlieb-Cruickshank, detailing various transactions that took place in Poland in January 2018. The data includes information on customers, products, and sales teams, with a focus on the pharmaceutical industry. Below is a detailed description of the dataset:
Columns:
Row 1:
Row 2:
This dataset can be utilized for various analyses, including sales performance by city, product, and sales teams, as well as geographical distribution of sales within Poland. It provides valuable insights into the pharmaceutical sales strategies and their execution within a specific time frame.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator's Job Openings Data for the Netherlands: A Comprehensive Resource for Employment Insights
Techsalerator's Job Openings Data for the Netherlands is a crucial resource for businesses, job seekers, and labor market analysts. This dataset offers a thorough overview of job openings across various sectors in the Netherlands, consolidating information from multiple sources like company websites, job boards, and recruitment agencies.
To access Techsalerator’s Job Openings Data for the Netherlands, contact info@techsalerator.com with your specific data needs. We offer customized quotes based on the fields and records required, with delivery available within 24 hours. Ongoing access options are also available.
Techsalerator’s dataset is an invaluable tool for those looking to stay informed about job openings and labor trends in the Netherlands, empowering businesses, job seekers, and analysts to make informed decisions.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The "E-Commerce Pet Supplies Dataset" offers a comprehensive snapshot of pet products sold on an online platform. This dataset includes 1,998 entries, each representing a unique pet supply product available for purchase. It is tailored for data science projects focused on e-commerce trends, product popularity, and consumer preferences within the pet supply market.
This dataset is ideal for several data science applications, including: - Market Trend Analysis: Understanding popular products based on sales and customer wishes. - Product Rating Analysis: Evaluating customer satisfaction through average star ratings. - Inventory Management: Analyzing quantity data to optimize stock levels. - Sales Prediction: Developing models to predict future sales based on existing data.
This dataset is compiled using ethical mining practices with the assistance of Apify and does not contain any personal data or sensitive customer information, ensuring compliance with data protection standards.
We express our gratitude to Apify for facilitating the data collection and AliExpress for allowing access to the relevant e-commerce data. Additionally, we acknowledge DALL-E 3 for creating the thumbnail and cover images used for this dataset.
This dataset is provided for learning and educational purposes only. It should not be used for commercial purposes. Users are encouraged to use this data to enhance their understanding of e-commerce dynamics, specifically in the pet supply sector, and to develop data science skills related to real-world applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 30,000 records representing job market and salary trends across India’s top companies, popular tech and non-tech roles, multiple cities, and experience levels. It is designed to help researchers, analysts, and machine-learning practitioners analyze salary patterns, hiring demand, geographic trends, remote-work adoption, and career progression dynamics in the Indian job ecosystem.
Each row represents a job-related snapshot tied to a specific date, making the dataset suitable for trend analysis, forecasting, and workforce analytics. Salaries have been generated using realistic ranges for each job category in India, and demand indicators reflect hiring trends seen across major urban centers such as Bangalore, Hyderabad, Pune, and Delhi.
This dataset is useful for: Salary prediction (regression modeling) Job market demand forecasting Skill-driven salary analysis Remote work adoption analytics City-wise hiring and compensation gaps Experience-level salary progression Time-series trend analysis Workforce planning and HR analytics ML projects using categorical + time-series data
All company names, job roles, cities, and experience ranges are based on real Indian labor market dynamics, making the dataset relevant, practical, and highly useful for exploratory analysis and machine learning.
COLUMN DESCRIPTIONS
Record_Date The date of the job market observation. Useful for time-series trend modeling.
Company_Name The Indian company offering the job role (e.g., TCS, Infosys, Reliance, Accenture India, Amazon India).
Job_Role The position or designation (e.g., Software Engineer, Data Analyst, Product Manager).
Experience_Level Experience bracket required for the role (e.g., 1–3 years, 5–8 years).
City Location of the job such as Bangalore, Pune, or Hyderabad.
Salary_INR Annual salary offered for that job role (in Indian Rupees).
Demand_Index A 0–100 indicator representing hiring demand for that role/location.
Remote_Option_Flag 1 if the job supports remote/hybrid work, 0 if on-site only.
Salary_Trend_Pct Month-to-month salary percentage change for similar roles.
Facebook
TwitterSupply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides insights into factors influencing defect rates in a manufacturing environment. Each record represents various metrics crucial for predicting high or low defect occurrences in production processes.
ProductionVolume: Number of units produced per day. - Data Type: Integer. - Range: 100 to 1000 units/day.
ProductionCost: Cost incurred for production per day. - Data Type: Float. - Range: $5000 to $20000.
SupplierQuality: Quality ratings of suppliers. - Data Type: Float (%). - Range: 80% to 100%.
DeliveryDelay: Average delay in delivery. - Data Type: Integer (days). - Range: 0 to 5 days.
DefectRate: Defects per thousand units produced. - Data Type: Float. - Range: 0.5 to 5.0 defects.
QualityScore: Overall quality assessment. - Data Type: Float (%). - Range: 60% to 100%.
MaintenanceHours: Hours spent on maintenance per week. - Data Type: Integer. - Range: 0 to 24 hours.
DowntimePercentage: Percentage of production downtime. - Data Type: Float (%). - Range: 0% to 5%.
InventoryTurnover: Ratio of inventory turnover. - Data Type: Float. - Range: 2 to 10.
StockoutRate: Rate of inventory stockouts. - Data Type: Float (%). - Range: 0% to 10%.
WorkerProductivity: Productivity level of the workforce. - Data Type: Float (%). - Range: 80% to 100%.
SafetyIncidents: Number of safety incidents per month. - Data Type: Integer. - Range: 0 to 10 incidents.
EnergyConsumption: Energy consumed in kWh. - Data Type: Float. - Range: 1000 to 5000 kWh.
EnergyEfficiency: Efficiency factor of energy usage. - Data Type: Float. - Range: 0.1 to 0.5.
AdditiveProcessTime: Time taken for additive manufacturing. - Data Type: Float (hours). - Range: 1 to 10 hours.
AdditiveMaterialCost: Cost of additive materials per unit. - Data Type: Float ($). - Range: $100 to $500.
DefectStatus: Predicted defect status. - Data Type: Binary (0 for Low Defects, 1 for High Defects).
The dataset focuses on defect instances more because they do not occur often. However, non-defect instances were added too for this reason the dataset is imbalanced, consider balancing it before proceeding with machine learning techniques.
This dataset encompasses a comprehensive collection of metrics vital for predicting defect rates in manufacturing operations. It includes production volumes, supply chain quality, quality control assessments, maintenance schedules, inventory management details, workforce productivity metrics, energy consumption patterns, additive manufacturing specifics, and more.
This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.
This dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects. It is an original dataset, owned by Mr. Rabie El Kharoua, and has not been previously shared. You are free to use it under the license outlined on the data card. The dataset is offered without any guarantees. Details about the data provider will be shared soon.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.
Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself.
Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation
From: Head of Data Science
Received: Today
Subject: New project from the product team
Hey!
I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.
I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!
They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.
You can find more details about what I expect you to do here. And information on the data here.
I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.
Good Luck!
From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?
Hi,
We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?
At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.
Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?
We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?
Look forward to seeing your presentation.
Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.
Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.
This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.
Tomato Soup
Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $
Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g
Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock
Method: 1. Cut the tomatoes into quarters….
The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.
As you will see, they haven't given us all of the information they have about each recipe.
You can find the data here.
I will let you decide how to process it, just make sure you include all your decisions in your report.
Don't forget to double check the data really does match what they say - it might not.
| Column Name | Details |
|---|---|
| recipe | Numeric, unique identifier of recipe |
| calories | Numeric, number of calories |
| carbohydrate | Numeric, amount of carbohydrates in grams |
| sugar | Numeric, amount of sugar in grams |
| protein | Numeric, amount of prote... |
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset collects all the elements created and used by master students for the design and prototyping of a drone. The dataset thus contains a great heterogeneity of structured and unstructured documents, mainly written in French but also in English. In particular, the dataset was extracted from a digital mock-up whose tree structure and document metadata are listed in the 'PRODUCT.3dxml' and 'PLMDmtDocument.3dxml' respectively. This dataset is currently being used in a PhD thesis [1-4] to experiment with queries on data similar to that of the manufacturing industry.
[1] Kim, L., Yahia, E., Segonds, F., Véron, P., & Mallet, A. (2021). i-Dataquest: A heterogeneous information retrieval tool using data graph for the manufacturing industry. Computers in Industry, 132, 103527. [2] Kim, L., Yahia, E., Segonds, F., Véron, P., & Fau, V. (2021). Key issues for a manufacturing data query system based on graph. International Journal on Interactive Design and Manufacturing (IJIDeM), 15(4), 397-407. [3] Kim, L., Yahia, E., Segonds, F., Veron, P., & Fau, V. (2020). Essential Issues to Consider for a Manufacturing Data Query System Based on Graph. In International Joint Conference on Mechanics, Design Engineering & Advanced Manufacturing (pp. 347-353). Springer, Cham. [4] Kim, L., Yahia, E., Segonds, F., Véron, P., & Mallet, A. (2020). i-DATAQUEST: A Proposal for a Manufacturing Data Query System Based on a Graph. Proceedings of the 17th IFIP International Conference on Product Lifecycle Management, 227-238. Springer, Cham. doi:10.1007/978-3-030-62807-9_19
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains inventory data for a pharmacy e-commerce website in JSON format, designed for easy integration into MongoDB databases, making it ideal for MERN stack projects. It includes 10 fields:
This dataset is useful for developing pharmacy-related web applications, inventory management systems, or online medical stores using the MERN stack.
Do not use for production-level purposes; use for project development only. Feel free to contribute if you find any mistakes or have suggestions.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a synthetic dataset generated to mimic real-world e-commerce return management scenarios. Since actual return data is often confidential and unavailable, this dataset was created with realistic assumptions around orders, products, customers, and return behaviors.
It can be used for:
Predictive modeling of return likelihood (classification problems).
Business analytics on profitability loss due to returns.
Sustainability analysis (CO₂ emissions and waste impact from reverse logistics).
📌 Dataset Features (Columns)
Order_ID → Unique order identifier.
Product_ID → Unique product identifier.
User_ID → Unique customer identifier.
Order_Date → Date when the order was placed.
Return_Date → Date when the product was returned (if returned).
Product_Category → Category of the product (e.g., Clothing, Electronics, Books, Toys, etc.).
Product_Price → Price of the product per unit.
Order_Quantity → Number of units purchased in the order.
Discount_Applied → Discount percentage applied on the product.
Return_Status → Whether the order was Returned or Not Returned.
Return_Reason → Reason for return (e.g., Damaged, Wrong Item, Changed Mind).
Days_to_Return → Number of days taken by customer to return (0 if not returned).
User_Age → Age of the customer.
User_Gender → Gender of the customer (Male/Female).
User_Location → City/region of the customer.
Payment_Method → Mode of payment (Credit Card, Debit Card, PayPal, Gift Card, etc.).
Shipping_Method → Chosen shipping type (Standard, Express, Next-Day).
Return_Cost → Estimated logistics cost incurred when a return happens.
Profit_Loss → Net profit or loss for the order, considering product price, discount, and return cost.
CO2_Saved → Estimated CO₂ emissions saved (if return avoided).
Waste_Avoided → Estimated physical waste avoided (in units/items).
💡 Use Cases
MBA & academic projects in Business Analytics and Supply Chain Management.
Training predictive models for return forecasting.
Measuring sustainability KPIs (CO₂ reduction, waste avoidance).
Dashboards in Power BI/Tableau for business decision-making.
Quick Start Example:
import pandas as pd
df = pd.read_csv("/kaggle/input/synthetic-ecommerce-returns/returns_sustainability_dataset.csv")
print(df.head())
print(df.info())
print(df['Return_Status'].value_counts(normalize=True))
category_returns = df.groupby('Product_Category')['Return_Status'].mean().sort_values(ascending=False) print(category_returns)