47 datasets found

Corporate_work_hours_productivity
kaggle.com
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SuryaDeepthi (2025). Corporate_work_hours_productivity [Dataset]. https://www.kaggle.com/datasets/suryadeepthi/corporate-work-hours-productivity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2025
Dataset provided by
Kaggle
Authors
SuryaDeepthi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains 10,000 records of corporate employees across various departments, focusing on work hours, job satisfaction, and productivity performance. The dataset is designed for exploratory data analysis (EDA), performance benchmarking, and predictive modeling of productivity trends.

You can conduct EDA and investigate correlations between work hours, remote work, job satisfaction, and productivity. You can create new metrics like efficiency per hour or impact of meetings on productivity. Machine Learning Model: If you want a predictive task, you can use "Productivity_Score" as a regression target (predicting continuous performance scores). Or you can also create a classification problem (e.g., categorize employees into high, medium, or low productivity).
Exploratory Data Analysis | EDA - use case
kaggle.com
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohinur Abdurahimova (2022). Exploratory Data Analysis | EDA - use case [Dataset]. https://www.kaggle.com/mohinurabdurahimova/exploratory-data-analysis-eda-use-case/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohinur Abdurahimova
Description
Dataset

This dataset was created by Mohinur Abdurahimova

Released under Data files © Original Authors

Contents
f
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
R
Eda_all Dataset
universe.roboflow.com
zip
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cropperyash (2024). Eda_all Dataset [Dataset]. https://universe.roboflow.com/cropperyash/eda_all/model/1
Explore at:
zipAvailable download formats
Dataset updated
May 24, 2024
Dataset authored and provided by
cropperyash
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
All Polygons
Description
Eda_all

## Overview Eda_all is a dataset for instance segmentation tasks - it contains All annotations for 1,314 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Detailed characterization of the dataset.
figshare.com
xls
Updated Sep 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Detailed characterization of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310707.t006
Dataset updated
Sep 26, 2024
Dataset provided by
PLOS ONE
Authors
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.
Synthetic HR Burnout Dataset
kaggle.com
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anvar Kamaleyev (2025). Synthetic HR Burnout Dataset [Dataset]. https://www.kaggle.com/datasets/ankam6010/synthetic-hr-burnout-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 3, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anvar Kamaleyev
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset simulates employee-level data for burnout prediction and classification tasks. It can be used for binary classification, exploratory data analysis (EDA), and feature importance exploration.

📄 Columns Name — Synthetic employee name (for realism, not for ML use).

Age — Age of the employee.

Gender — Male or Female.

JobRole — Job type (Engineer, HR, Manager, etc.).

Experience — Years of work experience.

WorkHoursPerWeek — Average number of working hours per week.

RemoteRatio — % of time spent working remotely (0–100).

SatisfactionLevel — Self-reported satisfaction (1.0 to 5.0).

StressLevel — Self-reported stress level (1 to 10).

Burnout — Target variable. 1 if signs of burnout exist (high stress + low satisfaction + long hours), otherwise 0.
R
Solar Panel Eda Dataset
universe.roboflow.com
zip
Updated Aug 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramkumar (2024). Solar Panel Eda Dataset [Dataset]. https://universe.roboflow.com/ramkumar/solar-panel-eda
Explore at:
zipAvailable download formats
Dataset updated
Aug 29, 2024
Dataset authored and provided by
Ramkumar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Solar Panel Bounding Boxes
Description
Solar Panel EDA

## Overview Solar Panel EDA is a dataset for object detection tasks - it contains Solar Panel annotations for 721 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Employee Turnover Analytics Dataset
kaggle.com
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshay Hedau (2023). Employee Turnover Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/akshayhedau/employee-turnover-analytics-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akshay Hedau
Description
Portobello Tech is an app innovator that has devised an intelligent way of predicting employee turnover within the company. It periodically evaluates employees' work details including the number of projects they worked upon, average monthly working hours, time spent in the company, promotions in the last 5 years, and salary level. Data from prior evaluations show the employee’s satisfaction at the workplace. The data could be used to identify patterns in work style and their interest to continue to work in the company. The HR Department owns the data and uses it to predict employee turnover. Employee turnover refers to the total number of workers who leave a company over a certain time period.
w
Dataset of authors, books and publication dates of book series where authors...
workwithdata.com
Updated Nov 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of authors, books and publication dates of book series where authors equals Eda Kranakis [Dataset]. https://www.workwithdata.com/datasets/book-series?col=book_series%2Cj0-author%2Cj0-book%2Cj0-publication_date&f=1&fcol0=j0-author&fop0=%3D&fval0=Eda+Kranakis&j=1&j0=books
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book series. It has 1 row and is filtered where the authors is Eda Kranakis. It features 4 columns: authors, books, and publication dates.
n
Data from: Assessing predictive performance of supervised machine learning...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wh70rxwrh
Dataset updated
May 23, 2023
Dataset provided by
Strathmore University
Authors
Evans Omondi
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.
d
EDA Faces Challenges in Effectively Monitoring Its Revolving Loan Funds
catalog.data.gov
s.cnmilf.com
+1more
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Inspector General (2020). EDA Faces Challenges in Effectively Monitoring Its Revolving Loan Funds [Dataset]. https://catalog.data.gov/dataset/eda-faces-challenges-in-effectively-monitoring-its-revolving-loan-funds
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
Office of Inspector General
Description
An audit with findings and recommendations for improvements of management of EDA's Revolving Loan Fund program. This program is designed to provide grants to state and local governments, political subdivisions, and nonprofit organizations to operate lending programs to businesses that cannon get traditional bank financing.
h
V2-Balloon-Detection-Dataset
huggingface.co
Updated Sep 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Globose Technology Solutions (2024). V2-Balloon-Detection-Dataset [Dataset]. https://huggingface.co/datasets/gtsaidata/V2-Balloon-Detection-Dataset
Explore at:
Dataset updated
Sep 5, 2024
Authors
Globose Technology Solutions
Description
Description: 👉 Download the dataset here This dataset was created to serve as an easy-to-use image dataset, perfect for experimenting with object detection algorithms. The main goal was to provide a simplified dataset that allows for quick setup and minimal effort in exploratory data analysis (EDA). This dataset is ideal for users who want to test and compare object detection models without spending too much time navigating complex data structures. Unlike datasets like chest x-rays, which… See the full description on the dataset page: https://huggingface.co/datasets/gtsaidata/V2-Balloon-Detection-Dataset.
h
model_dataset
huggingface.co
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prajapati Jignesh Ganeshbhai (2025). model_dataset [Dataset]. https://huggingface.co/datasets/JigneshPrajapati18/model_dataset
Explore at:
Dataset updated
Jun 14, 2025
Authors
Prajapati Jignesh Ganeshbhai
Description
🎵 Music Feature Dataset Analysis

This repository contains a comprehensive exploratory data analysis (EDA) on a music features dataset. The primary objective is to understand the patterns in audio features and analyze how they relate to user preferences, providing insights for music recommendation systems and user profiling.

📥 Dataset Overview

The dataset (data.csv) contains audio features extracted from music tracks along with user preference scores. This rich… See the full description on the dataset page: https://huggingface.co/datasets/JigneshPrajapati18/model_dataset.
c
SuperherosabilitiesDataset
cubig.ai
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). SuperherosabilitiesDataset [Dataset]. https://cubig.ai/store/products/532/superherosabilitiesdataset
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Superheros_abilities_dataset is data that contains the abilities and attributes of 200 superheroes and villains from Marvel and the DC Universe, consisting of 10 columns: each character's name, moral orientation, strength, speed, intelligence, combat power, major weapons/capabilities, overall power, and popularity.

2) Data Utilization (1) Superheros_abilities_dataset has characteristics that: • This dataset is a small, refined dataset that reflects real-world situations with some missing values and is designed to be easily utilized by beginners. • A structure that reflects real-world situations with some missing values, a small, refined dataset designed for ease of use by beginners. (2) Superheros_abilities_dataset can be used to: • Classification Model Practice: It can be used to develop classification models that predict moral tendencies, such as Hero/Villain/Antihero, based on the character's ability values and attributes. • Cluster and Visualization: You can cluster groups of similar characters based on various abilities and attributes, or use them for EDA and data visualization exercises.
The Global EDA Market size was USD 14.9 billion in 2023!
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). The Global EDA Market size was USD 14.9 billion in 2023! [Dataset]. https://www.cognitivemarketresearch.com/eda-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Apr 30, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, The Global EDA Market size will be USD 14.9 billion in 2023 and will grow at a compound annual growth rate (CAGR) of 10.50% from 2023 to 2030.

The demand for the EDA Market is rising due to the rise in outdoor and adventure activities. Changing consumer lifestyle trends are higher in the EDA market. The cat segment held the highest EDA Market revenue share in 2023. North American EDA will continue to lead, whereas the European EDA Market will experience the most substantial growth until 2030.

Supply Chain and Risk Analysis to Provide Viable Market Output

The industry is facing supply chain and logistics disruptions. EDA tools have been instrumental in analyzing supply chain data, identifying vulnerabilities, predicting risks, and developing disruption mitigation strategies. Consumer behavior has undergone drastic changes due to blockages and restrictions. EDA helps companies analyze changing trends in buying behavior, online shopping preferences, and demand patterns, enabling organizations to adjust their marketing and sales strategies accordingly.

Health and Pharmaceutical Research to Propel Market Growth.

EDA tools have played a key role in analyzing large amounts of data related to vaccine development, drug trials, patient records and epidemiological studies. These tools have helped researchers process and interpret complex medical data, leading to advances in the development of treatments and vaccines. The pandemic has created challenges in data collection, especially in sectors affected by lockdowns or blackouts. Rapidly changing conditions and incomplete data sets make effective EDA difficult due to data quality issues. The economic uncertainty caused by the pandemic has led to budget cuts in some sectors, impacting investment in new technologies. Some organizations have limited budgets that limit their ability to adopt or update EDA tools.

Market Dynamics of the EDA

Privacy and Data Security Issues to Restrict Market Growth.

With the focus on data privacy regulations such as GDPR, CCPA, etc., organizations need to ensure compliance when handling sensitive data. These compliance requirements may limit the scope of the EDA by limiting the availability and use of certain data sets for information analysis. EDA often requires data analysts or data scientists who are skilled in statistical analysis and data visualization tools. A lack of professionals with these specialized skills can hinder an organization's ability to use EDA tools effectively, limiting adoption. Advanced EDA techniques can involve complex algorithms and statistical techniques that are difficult for non-technical users to understand. Interpreting results and deriving actionable insights from EDA results pose challenges that affect applicability to a wider audience.

Key Opportunity of market.

Growing miniaturization in various industries can be an opportunity.

With the age of highly advanced electronics, miniaturization has become a trend that enabled organizations across diverse sectors such as healthcare, consumer electronics, aerospace and defense, automotive and others to design miniature electronic devices. The devices incorporate miniaturized semiconductor components, e.g., surgical instruments and blood glucose meters in healthcare, fitness bands in wearable devices, automotive modules in the automotive sector, and intelligent baggage labels. Miniaturization has a number of advantages such as freeing space for other features and better batteries. The increased consciousness among consumers towards fitness is fueling the demand for smaller fitness devices such as smartwatches and fitness trackers. This is motivating companies to come up with innovative products with improved features, while researchers are concentrating on cost-effective and efficient product development through electronic design tools. Besides, use of portable equipment has gained immense popularity among media professionals because of the increasing demand for live reporting of different events like riots, accidents, sports, and political rallies. As a result of the inconvenience in the use of cumbersome TV production vans to access such events, demand for portable handheld equipment has risen. Such devices are simply portable and can be quickly moved to the event venue if carried in backpacks. Therefore, the need for compact devices across various indust...
Z
GLARE: Google Apps Arabic Reviews Dataset
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alowisheq, Areeb (2024). GLARE: Google Apps Arabic Reviews Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6457823
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Al-Khalifa, Hend
Mohammed, Reem
AlGhamdi, Fatima
Alowisheq, Areeb
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper introduces GLARE an Arabic Apps Reviews dataset collected from Saudi Google PlayStore. It consists of 76M reviews, 69M of which are Arabic reviews of 9,980 Android Applications. We present the data collection methodology, along with a detailed Exploratory Data Analysis (EDA) and Feature Engineering on the gathered reviews. We also highlight possible use cases and benefits of the dataset.
A
‘Transactional Retail Dataset of Electronics Store’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Transactional Retail Dataset of Electronics Store’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-transactional-retail-dataset-of-electronics-store-e86c/6f6d91df/?iid=000-353&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Transactional Retail Dataset of Electronics Store’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/muhammadshahrayar/transactional-retail-dataset-of-electronics-store on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset contains information about an online electronic store. The store has three warehouses from which goods are delivered to customers.

Columns Description

order_id: A unique id for each order

customer_id: A unique id for each customer

date: The date the order was made, given in YYYY-MM-DD format

nearest_warehouse: A string denoting the name of the nearest warehouse to the customer

shopping_cart: A list of tuples representing the order items: the first element of the tuple is the item ordered, and the second element is the quantity ordered for such item.

order_price: A float denoting the order price in USD. The order price is the price of items before any discounts and/or delivery charges are applied.

delivery_charges: A float representing the delivery charges of the order

customer_lat: Latitude of the customer’s location

customer_long: Longitude of the customer’s location

coupon_discount: An integer denoting the percentage discount to be applied to the order_price.

order_total: A float denoting the total of the order in USD after all discounts and/or delivery charges are applied.

season: A string denoting the season in which the order was placed.

is_expedited_delivery: A boolean denoting whether the customer has requested an expedited delivery

distance_to_nearest_warehouse: A float representing the arc distance, in kilometres, between the customer and the nearest warehouse to him/her.

latest_customer_review: A string representing the latest customer review on his/her most recent order

is_happy_customer: A boolean denoting whether the customer is a happy customer or had an issue with his/her last order.

Inspiration

Use this dataset to perform graphical and/or non-graphical EDA methods to understand the data first and then find and fix the data problems. - Detect and fix errors in dirty_data.csv - Impute the missing values in missing_data.csv - Detect and remove Anolamies - To check whether a customer is happy with their last order

All the Best

--- Original source retains full ownership of the source dataset ---
A
‘COVID-19 dataset in Japan’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 dataset in Japan’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-dataset-in-japan-2665/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Japan
Description
Analysis of ‘COVID-19 dataset in Japan’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lisphilar/covid19-dataset-in-japan on 28 January 2022.

--- Dataset description provided by original source is as follows ---

1. Context

This is a COVID-19 dataset in Japan. This does not include the cases in Diamond Princess cruise ship (Yokohama city, Kanagawa prefecture) and Costa Atlantica cruise ship (Nagasaki city, Nagasaki prefecture). - Total number of cases in Japan - The number of vaccinated people (New/experimental) - The number of cases at prefecture level - Metadata of each prefecture

Note: Lisphilar (author) uploads the same files to https://github.com/lisphilar/covid19-sir/tree/master/data

This dataset can be retrieved with CovsirPhy (Python library).

pip install covsirphy --upgrade

import covsirphy as cs data_loader = cs.DataLoader() japan_data = data_loader.japan() # The number of cases (Total/each province) clean_df = japan_data.cleaned() # Metadata meta_df = japan_data.meta()

Please refer to CovsirPhy Documentation: Japan-specific dataset.

Note: Before analysing the data, please refer to Kaggle notebook: EDA of Japan dataset and COVID-19: Government/JHU data in Japan. The detailed explanation of the build process is discussed in Steps to build the dataset in Japan. If you find errors or have any questions, feel free to create a discussion topic.

1.1 Total number of cases in Japan

covid_jpn_total.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - with symptoms (to 08May2020) / without symptoms (to 08May2020) / unknown (to 08May2020) - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with mild symptoms (to 08May2020) / severe symptoms / unknown (to 08May2020) - requiring hospitalization, but waiting in hotels or at home (to 08May2020)

In primary source, some variables were removed on 09May2020. Values are NA in this dataset from 09May2020.

Manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

The number of vaccinated people: - Vaccinated_1st: the number of vaccinated persons for the first time on the date - Vaccinated_2nd: the number of vaccinated persons with the second dose on the date - Vaccinated_3rd: the number of vaccinated persons with the third dose on the date

Data sources for vaccination: - To 09Apr2021: 厚生労働省 HP 新型コロナワクチンの接種実績(in Japanese) - 首相官邸新型コロナワクチンについて - From 10APr2021: Twitter: 首相官邸（新型コロナワクチン情報）

1.2 The number of cases at prefecture level

covid_jpn_prefecture.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with severe symptoms (from 09May2020)

Using pdf-excel converter, manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

Note: covid_jpn_prefecture.groupby("Date").sum() does not match covid_jpn_total. When you analyse total data in Japan, please use covid_jpn_total data.

1.3 Metadata of each prefecture

covid_jpn_metadata.csv - Population (Total, Male, Female): 厚生労働省厚生統計要覧（2017年度）第１－５表 - Area (Total, Habitable): Wikipedia 都道府県の面積一覧 (2015)

Hospital_bed: With the primary data of 厚生労働省感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省第二種感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省医療施設動態調査（令和２年１月末概数）, 厚生労働省感染症指定医療機関について and secondary data of COVID-19 Japan 都道府県別感染症病床数,

Specific: Hospital beds of medical institutions designated for specific infectious diseases

Type-I: Hospital beds of medical institutions designated for type I infectious diseases

Type-II: Hospital beds of medical institutions designated for type II infectious diseases

Tuberculosis: Hospital beds of medical institutions designated for tuberculosis (outpatient care)

Care: long term care bed of hospitals

Total: Beds of all hospitals

Clinic_bed: With the primary data of 医療施設動態調査（令和２年１月末概数） ,

Care: long term care beds of clinics

Total: Beds of all clinics

Location: Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Latitude

Longitude

Admin

Capital: Prefectural capital city. Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Region: Region name. Data is from WIkipedia (secondary data). "Kyushu-Okinawa region" was separated to "Kyushu" and "Okinawa" by this datasets' author.

Num: Prefecture code (JIS X 0401: Hokkaido=1,...Okinawa=47). Data is from 国土交通省 GIS HP Pref code. cf. (not source) Japan VIsitor: Japan Prefectures Map.

2. Acknowledgements

To create this dataset, edited and transformed data of the following sites was used.

厚生労働省 Ministry of Health, Labour and Welfare, Japan:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English) 厚生労働省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

国土交通省 Ministry of Land, Infrastructure, Transport and Tourism, Japan: 国土交通省 HP (in Japanese) 国土交通省 HP (in English) 国土交通省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

Code for Japan / COVID-19 Japan: Code for Japan COVID-19 Japan Dashboard (CC BY 4.0) COVID-19 Japan 都道府県別感染症病床数 (CC BY)

Wikipedia: Wikipedia

LinkData: LinkData (Public Domain)

Inspiration

Changes in number of cases over time

Percentage of patients without symptoms / mild or severe symptoms

What to do next to prevent outbreak

License and how to cite

Kindly cite this dataset under CC BY-4.0 license as follows. - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan, or - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, Kaggle Dataset, https://www.kaggle.com/lisphilar/covid19-dataset-in-japan

--- Original source retains full ownership of the source dataset ---
R
Justcorners Dataset
universe.roboflow.com
zip
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
eda (2024). Justcorners Dataset [Dataset]. https://universe.roboflow.com/eda-u71yj/justcorners
Explore at:
zipAvailable download formats
Dataset updated
Jan 15, 2024
Dataset authored and provided by
eda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Money Bounding Boxes
Description
Justcorners

## Overview Justcorners is a dataset for object detection tasks - it contains Money annotations for 901 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
EDA - Jobs created/retained - 3 year totals
performance.commerce.gov
application/rdfxml +5
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic Development Administration (2025). EDA - Jobs created/retained - 3 year totals [Dataset]. https://performance.commerce.gov/KPI-EDA/EDA-Jobs-created-retained-3-year-totals/t2zs-iubw
Explore at:
xml, csv, json, tsv, application/rdfxml, application/rssxmlAvailable download formats
Dataset updated
Mar 6, 2025
Dataset authored and provided by
Economic Development Administrationhttp://www.eda.gov/
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
The formula-driven calculation projects investment data at 3, 6, and 9 year intervals from the investment award. The formula is based on a study done by Rutgers University, which compiled and analyzed the performance of EDA construction investments after 9 years. This approach was reviewed and validated by third-party analysis conducted by Grant Thornton in 2008. Based on this formula and a review of EDA's historical results, EDA estimates that 40% of the 9-year projection would be realized after 3 years, 75% after 6 years, and 100% after 9 years.

Facebook

Twitter

Click to copy link

Link copied

Cite

SuryaDeepthi (2025). Corporate_work_hours_productivity [Dataset]. https://www.kaggle.com/datasets/suryadeepthi/corporate-work-hours-productivity

Corporate_work_hours_productivity

Analyzing Work Hours, Job Satisfaction & Productivity Trends

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 13, 2025

Dataset provided by

Kaggle

Authors

SuryaDeepthi

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset contains 10,000 records of corporate employees across various departments, focusing on work hours, job satisfaction, and productivity performance. The dataset is designed for exploratory data analysis (EDA), performance benchmarking, and predictive modeling of productivity trends.

You can conduct EDA and investigate correlations between work hours, remote work, job satisfaction, and productivity. You can create new metrics like efficiency per hour or impact of meetings on productivity. Machine Learning Model: If you want a predictive task, you can use "Productivity_Score" as a regression target (predicting continuous performance scores). Or you can also create a classification problem (e.g., categorize employees into high, medium, or low productivity).

Clear search

Close search

Google apps

Main menu

Corporate_work_hours_productivity

Exploratory Data Analysis | EDA - use case

Dataset

Contents

Orange dataset table

Eda_all Dataset

Eda_all

Detailed characterization of the dataset.

Synthetic HR Burnout Dataset

Solar Panel Eda Dataset

Solar Panel EDA

Employee Turnover Analytics Dataset

Dataset of authors, books and publication dates of book series where authors...

Data from: Assessing predictive performance of supervised machine learning...

EDA Faces Challenges in Effectively Monitoring Its Revolving Loan Funds

V2-Balloon-Detection-Dataset

model_dataset

SuperherosabilitiesDataset

The Global EDA Market size was USD 14.9 billion in 2023!

GLARE: Google Apps Arabic Reviews Dataset

‘Transactional Retail Dataset of Electronics Store’ analyzed by Analyst-2

Context

Columns Description

Inspiration

‘COVID-19 dataset in Japan’ analyzed by Analyst-2

1. Context

1.1 Total number of cases in Japan

1.2 The number of cases at prefecture level

1.3 Metadata of each prefecture

2. Acknowledgements

Inspiration

License and how to cite

Justcorners Dataset

Justcorners

EDA - Jobs created/retained - 3 year totals

Corporate_work_hours_productivity

Analyzing Work Hours, Job Satisfaction & Productivity Trends