47 datasets found
  1. Corporate_work_hours_productivity

    • kaggle.com
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SuryaDeepthi (2025). Corporate_work_hours_productivity [Dataset]. https://www.kaggle.com/datasets/suryadeepthi/corporate-work-hours-productivity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Kaggle
    Authors
    SuryaDeepthi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 10,000 records of corporate employees across various departments, focusing on work hours, job satisfaction, and productivity performance. The dataset is designed for exploratory data analysis (EDA), performance benchmarking, and predictive modeling of productivity trends.

    You can conduct EDA and investigate correlations between work hours, remote work, job satisfaction, and productivity. You can create new metrics like efficiency per hour or impact of meetings on productivity. Machine Learning Model: If you want a predictive task, you can use "Productivity_Score" as a regression target (predicting continuous performance scores). Or you can also create a classification problem (e.g., categorize employees into high, medium, or low productivity).

  2. Exploratory Data Analysis | EDA - use case

    • kaggle.com
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohinur Abdurahimova (2022). Exploratory Data Analysis | EDA - use case [Dataset]. https://www.kaggle.com/mohinurabdurahimova/exploratory-data-analysis-eda-use-case/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohinur Abdurahimova
    Description

    Dataset

    This dataset was created by Mohinur Abdurahimova

    Released under Data files © Original Authors

    Contents

  3. f

    Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  4. R

    Eda_all Dataset

    • universe.roboflow.com
    zip
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cropperyash (2024). Eda_all Dataset [Dataset]. https://universe.roboflow.com/cropperyash/eda_all/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 24, 2024
    Dataset authored and provided by
    cropperyash
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    All Polygons
    Description

    Eda_all

    ## Overview
    
    Eda_all is a dataset for instance segmentation tasks - it contains All annotations for 1,314 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  5. f

    Detailed characterization of the dataset.

    • figshare.com
    xls
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Detailed characterization of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.

  6. Synthetic HR Burnout Dataset

    • kaggle.com
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anvar Kamaleyev (2025). Synthetic HR Burnout Dataset [Dataset]. https://www.kaggle.com/datasets/ankam6010/synthetic-hr-burnout-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anvar Kamaleyev
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset simulates employee-level data for burnout prediction and classification tasks. It can be used for binary classification, exploratory data analysis (EDA), and feature importance exploration.

    📄 Columns Name — Synthetic employee name (for realism, not for ML use).

    Age — Age of the employee.

    Gender — Male or Female.

    JobRole — Job type (Engineer, HR, Manager, etc.).

    Experience — Years of work experience.

    WorkHoursPerWeek — Average number of working hours per week.

    RemoteRatio — % of time spent working remotely (0–100).

    SatisfactionLevel — Self-reported satisfaction (1.0 to 5.0).

    StressLevel — Self-reported stress level (1 to 10).

    Burnout — Target variable. 1 if signs of burnout exist (high stress + low satisfaction + long hours), otherwise 0.

  7. R

    Solar Panel Eda Dataset

    • universe.roboflow.com
    zip
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramkumar (2024). Solar Panel Eda Dataset [Dataset]. https://universe.roboflow.com/ramkumar/solar-panel-eda
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 29, 2024
    Dataset authored and provided by
    Ramkumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Solar Panel Bounding Boxes
    Description

    Solar Panel EDA

    ## Overview
    
    Solar Panel EDA is a dataset for object detection tasks - it contains Solar Panel annotations for 721 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. Employee Turnover Analytics Dataset

    • kaggle.com
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshay Hedau (2023). Employee Turnover Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/akshayhedau/employee-turnover-analytics-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Akshay Hedau
    Description

    Portobello Tech is an app innovator that has devised an intelligent way of predicting employee turnover within the company. It periodically evaluates employees' work details including the number of projects they worked upon, average monthly working hours, time spent in the company, promotions in the last 5 years, and salary level. Data from prior evaluations show the employee’s satisfaction at the workplace. The data could be used to identify patterns in work style and their interest to continue to work in the company. The HR Department owns the data and uses it to predict employee turnover. Employee turnover refers to the total number of workers who leave a company over a certain time period.

  9. w

    Dataset of authors, books and publication dates of book series where authors...

    • workwithdata.com
    Updated Nov 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of authors, books and publication dates of book series where authors equals Eda Kranakis [Dataset]. https://www.workwithdata.com/datasets/book-series?col=book_series%2Cj0-author%2Cj0-book%2Cj0-publication_date&f=1&fcol0=j0-author&fop0=%3D&fval0=Eda+Kranakis&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 1 row and is filtered where the authors is Eda Kranakis. It features 4 columns: authors, books, and publication dates.

  10. n

    Data from: Assessing predictive performance of supervised machine learning...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Strathmore University
    Authors
    Evans Omondi
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.

  11. d

    EDA Faces Challenges in Effectively Monitoring Its Revolving Loan Funds

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Inspector General (2020). EDA Faces Challenges in Effectively Monitoring Its Revolving Loan Funds [Dataset]. https://catalog.data.gov/dataset/eda-faces-challenges-in-effectively-monitoring-its-revolving-loan-funds
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    Office of Inspector General
    Description

    An audit with findings and recommendations for improvements of management of EDA's Revolving Loan Fund program. This program is designed to provide grants to state and local governments, political subdivisions, and nonprofit organizations to operate lending programs to businesses that cannon get traditional bank financing.

  12. h

    V2-Balloon-Detection-Dataset

    • huggingface.co
    Updated Sep 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Globose Technology Solutions (2024). V2-Balloon-Detection-Dataset [Dataset]. https://huggingface.co/datasets/gtsaidata/V2-Balloon-Detection-Dataset
    Explore at:
    Dataset updated
    Sep 5, 2024
    Authors
    Globose Technology Solutions
    Description

    Description: 👉 Download the dataset here This dataset was created to serve as an easy-to-use image dataset, perfect for experimenting with object detection algorithms. The main goal was to provide a simplified dataset that allows for quick setup and minimal effort in exploratory data analysis (EDA). This dataset is ideal for users who want to test and compare object detection models without spending too much time navigating complex data structures. Unlike datasets like chest x-rays, which… See the full description on the dataset page: https://huggingface.co/datasets/gtsaidata/V2-Balloon-Detection-Dataset.

  13. h

    model_dataset

    • huggingface.co
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prajapati Jignesh Ganeshbhai (2025). model_dataset [Dataset]. https://huggingface.co/datasets/JigneshPrajapati18/model_dataset
    Explore at:
    Dataset updated
    Jun 14, 2025
    Authors
    Prajapati Jignesh Ganeshbhai
    Description

    🎵 Music Feature Dataset Analysis

    This repository contains a comprehensive exploratory data analysis (EDA) on a music features dataset. The primary objective is to understand the patterns in audio features and analyze how they relate to user preferences, providing insights for music recommendation systems and user profiling.

      📥 Dataset Overview
    

    The dataset (data.csv) contains audio features extracted from music tracks along with user preference scores. This rich… See the full description on the dataset page: https://huggingface.co/datasets/JigneshPrajapati18/model_dataset.

  14. c

    SuperherosabilitiesDataset

    • cubig.ai
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). SuperherosabilitiesDataset [Dataset]. https://cubig.ai/store/products/532/superherosabilitiesdataset
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Superheros_abilities_dataset is data that contains the abilities and attributes of 200 superheroes and villains from Marvel and the DC Universe, consisting of 10 columns: each character's name, moral orientation, strength, speed, intelligence, combat power, major weapons/capabilities, overall power, and popularity.

    2) Data Utilization (1) Superheros_abilities_dataset has characteristics that: • This dataset is a small, refined dataset that reflects real-world situations with some missing values and is designed to be easily utilized by beginners. • A structure that reflects real-world situations with some missing values, a small, refined dataset designed for ease of use by beginners. (2) Superheros_abilities_dataset can be used to: • Classification Model Practice: It can be used to develop classification models that predict moral tendencies, such as Hero/Villain/Antihero, based on the character's ability values and attributes. • Cluster and Visualization: You can cluster groups of similar characters based on various abilities and attributes, or use them for EDA and data visualization exercises.

  15. The Global EDA Market size was USD 14.9 billion in 2023!

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). The Global EDA Market size was USD 14.9 billion in 2023! [Dataset]. https://www.cognitivemarketresearch.com/eda-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, The Global EDA Market size will be USD 14.9 billion in 2023 and will grow at a compound annual growth rate (CAGR) of 10.50% from 2023 to 2030.

    The demand for the EDA Market is rising due to the rise in outdoor and adventure activities.
    Changing consumer lifestyle trends are higher in the EDA market.
    The cat segment held the highest EDA Market revenue share in 2023.
    North American EDA will continue to lead, whereas the European EDA Market will experience the most substantial growth until 2030.
    

    Supply Chain and Risk Analysis to Provide Viable Market Output

    The industry is facing supply chain and logistics disruptions. EDA tools have been instrumental in analyzing supply chain data, identifying vulnerabilities, predicting risks, and developing disruption mitigation strategies. Consumer behavior has undergone drastic changes due to blockages and restrictions. EDA helps companies analyze changing trends in buying behavior, online shopping preferences, and demand patterns, enabling organizations to adjust their marketing and sales strategies accordingly.

    Health and Pharmaceutical Research to Propel Market Growth.
    

    EDA tools have played a key role in analyzing large amounts of data related to vaccine development, drug trials, patient records and epidemiological studies. These tools have helped researchers process and interpret complex medical data, leading to advances in the development of treatments and vaccines. The pandemic has created challenges in data collection, especially in sectors affected by lockdowns or blackouts. Rapidly changing conditions and incomplete data sets make effective EDA difficult due to data quality issues. The economic uncertainty caused by the pandemic has led to budget cuts in some sectors, impacting investment in new technologies. Some organizations have limited budgets that limit their ability to adopt or update EDA tools.

    Market Dynamics of the EDA

    Privacy and Data Security Issues to Restrict Market Growth.
    

    With the focus on data privacy regulations such as GDPR, CCPA, etc., organizations need to ensure compliance when handling sensitive data. These compliance requirements may limit the scope of the EDA by limiting the availability and use of certain data sets for information analysis. EDA often requires data analysts or data scientists who are skilled in statistical analysis and data visualization tools. A lack of professionals with these specialized skills can hinder an organization's ability to use EDA tools effectively, limiting adoption. Advanced EDA techniques can involve complex algorithms and statistical techniques that are difficult for non-technical users to understand. Interpreting results and deriving actionable insights from EDA results pose challenges that affect applicability to a wider audience.

    Key Opportunity of market.

    Growing miniaturization in various industries can be an opportunity.
    

    With the age of highly advanced electronics, miniaturization has become a trend that enabled organizations across diverse sectors such as healthcare, consumer electronics, aerospace and defense, automotive and others to design miniature electronic devices. The devices incorporate miniaturized semiconductor components, e.g., surgical instruments and blood glucose meters in healthcare, fitness bands in wearable devices, automotive modules in the automotive sector, and intelligent baggage labels. Miniaturization has a number of advantages such as freeing space for other features and better batteries. The increased consciousness among consumers towards fitness is fueling the demand for smaller fitness devices such as smartwatches and fitness trackers. This is motivating companies to come up with innovative products with improved features, while researchers are concentrating on cost-effective and efficient product development through electronic design tools. Besides, use of portable equipment has gained immense popularity among media professionals because of the increasing demand for live reporting of different events like riots, accidents, sports, and political rallies. As a result of the inconvenience in the use of cumbersome TV production vans to access such events, demand for portable handheld equipment has risen. Such devices are simply portable and can be quickly moved to the event venue if carried in backpacks. Therefore, the need for compact devices across various indust...

  16. Z

    GLARE: Google Apps Arabic Reviews Dataset

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alowisheq, Areeb (2024). GLARE: Google Apps Arabic Reviews Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6457823
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Al-Khalifa, Hend
    Mohammed, Reem
    AlGhamdi, Fatima
    Alowisheq, Areeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper introduces GLARE an Arabic Apps Reviews dataset collected from Saudi Google PlayStore. It consists of 76M reviews, 69M of which are Arabic reviews of 9,980 Android Applications. We present the data collection methodology, along with a detailed Exploratory Data Analysis (EDA) and Feature Engineering on the gathered reviews. We also highlight possible use cases and benefits of the dataset.

  17. A

    ‘Transactional Retail Dataset of Electronics Store’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Transactional Retail Dataset of Electronics Store’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-transactional-retail-dataset-of-electronics-store-e86c/6f6d91df/?iid=000-353&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Transactional Retail Dataset of Electronics Store’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/muhammadshahrayar/transactional-retail-dataset-of-electronics-store on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains information about an online electronic store. The store has three warehouses from which goods are delivered to customers.

    Columns Description

    • order_id: A unique id for each order
    • customer_id: A unique id for each customer
    • date: The date the order was made, given in YYYY-MM-DD format
    • nearest_warehouse: A string denoting the name of the nearest warehouse to the customer
    • shopping_cart: A list of tuples representing the order items: the first element of the tuple is the item ordered, and the second element is the quantity ordered for such item.
    • order_price: A float denoting the order price in USD. The order price is the price of items before any discounts and/or delivery charges are applied.
    • delivery_charges: A float representing the delivery charges of the order
    • customer_lat: Latitude of the customer’s location
    • customer_long: Longitude of the customer’s location
    • coupon_discount: An integer denoting the percentage discount to be applied to the order_price.
    • order_total: A float denoting the total of the order in USD after all discounts and/or delivery charges are applied.
    • season: A string denoting the season in which the order was placed.
    • is_expedited_delivery: A boolean denoting whether the customer has requested an expedited delivery
    • distance_to_nearest_warehouse: A float representing the arc distance, in kilometres, between the customer and the nearest warehouse to him/her.
    • latest_customer_review: A string representing the latest customer review on his/her most recent order
    • is_happy_customer: A boolean denoting whether the customer is a happy customer or had an issue with his/her last order.

    Inspiration

    Use this dataset to perform graphical and/or non-graphical EDA methods to understand the data first and then find and fix the data problems. - Detect and fix errors in dirty_data.csv - Impute the missing values in missing_data.csv - Detect and remove Anolamies - To check whether a customer is happy with their last order

    All the Best

    --- Original source retains full ownership of the source dataset ---

  18. A

    ‘COVID-19 dataset in Japan’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 dataset in Japan’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-dataset-in-japan-2665/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Japan
    Description

    Analysis of ‘COVID-19 dataset in Japan’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lisphilar/covid19-dataset-in-japan on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    1. Context

    This is a COVID-19 dataset in Japan. This does not include the cases in Diamond Princess cruise ship (Yokohama city, Kanagawa prefecture) and Costa Atlantica cruise ship (Nagasaki city, Nagasaki prefecture). - Total number of cases in Japan - The number of vaccinated people (New/experimental) - The number of cases at prefecture level - Metadata of each prefecture

    Note: Lisphilar (author) uploads the same files to https://github.com/lisphilar/covid19-sir/tree/master/data

    This dataset can be retrieved with CovsirPhy (Python library).

    pip install covsirphy --upgrade
    
    import covsirphy as cs
    data_loader = cs.DataLoader()
    japan_data = data_loader.japan()
    # The number of cases (Total/each province)
    clean_df = japan_data.cleaned()
    # Metadata
    meta_df = japan_data.meta()
    

    Please refer to CovsirPhy Documentation: Japan-specific dataset.

    Note: Before analysing the data, please refer to Kaggle notebook: EDA of Japan dataset and COVID-19: Government/JHU data in Japan. The detailed explanation of the build process is discussed in Steps to build the dataset in Japan. If you find errors or have any questions, feel free to create a discussion topic.

    1.1 Total number of cases in Japan

    covid_jpn_total.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - with symptoms (to 08May2020) / without symptoms (to 08May2020) / unknown (to 08May2020) - discharged - fatal

    The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with mild symptoms (to 08May2020) / severe symptoms / unknown (to 08May2020) - requiring hospitalization, but waiting in hotels or at home (to 08May2020)

    In primary source, some variables were removed on 09May2020. Values are NA in this dataset from 09May2020.

    Manually collected the data from Ministry of Health, Labour and Welfare HP:
    厚生労働省 HP (in Japanese)
    Ministry of Health, Labour and Welfare HP (in English)

    The number of vaccinated people: - Vaccinated_1st: the number of vaccinated persons for the first time on the date - Vaccinated_2nd: the number of vaccinated persons with the second dose on the date - Vaccinated_3rd: the number of vaccinated persons with the third dose on the date

    Data sources for vaccination: - To 09Apr2021: 厚生労働省 HP 新型コロナワクチンの接種実績(in Japanese) - 首相官邸 新型コロナワクチンについて - From 10APr2021: Twitter: 首相官邸(新型コロナワクチン情報)

    1.2 The number of cases at prefecture level

    covid_jpn_prefecture.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - discharged - fatal

    The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with severe symptoms (from 09May2020)

    Using pdf-excel converter, manually collected the data from Ministry of Health, Labour and Welfare HP:
    厚生労働省 HP (in Japanese)
    Ministry of Health, Labour and Welfare HP (in English)

    Note: covid_jpn_prefecture.groupby("Date").sum() does not match covid_jpn_total. When you analyse total data in Japan, please use covid_jpn_total data.

    1.3 Metadata of each prefecture

    covid_jpn_metadata.csv - Population (Total, Male, Female): 厚生労働省 厚生統計要覧(2017年度)第1-5表 - Area (Total, Habitable): Wikipedia 都道府県の面積一覧 (2015)

    2. Acknowledgements

    To create this dataset, edited and transformed data of the following sites was used.

    厚生労働省 Ministry of Health, Labour and Welfare, Japan:
    厚生労働省 HP (in Japanese)
    Ministry of Health, Labour and Welfare HP (in English) 厚生労働省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

    国土交通省 Ministry of Land, Infrastructure, Transport and Tourism, Japan: 国土交通省 HP (in Japanese) 国土交通省 HP (in English) 国土交通省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

    Code for Japan / COVID-19 Japan: Code for Japan COVID-19 Japan Dashboard (CC BY 4.0) COVID-19 Japan 都道府県別 感染症病床数 (CC BY)

    Wikipedia: Wikipedia

    LinkData: LinkData (Public Domain)

    Inspiration

    1. Changes in number of cases over time
    2. Percentage of patients without symptoms / mild or severe symptoms
    3. What to do next to prevent outbreak

    License and how to cite

    Kindly cite this dataset under CC BY-4.0 license as follows. - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan, or - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, Kaggle Dataset, https://www.kaggle.com/lisphilar/covid19-dataset-in-japan

    --- Original source retains full ownership of the source dataset ---

  19. R

    Justcorners Dataset

    • universe.roboflow.com
    zip
    Updated Jan 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    eda (2024). Justcorners Dataset [Dataset]. https://universe.roboflow.com/eda-u71yj/justcorners
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    eda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Money Bounding Boxes
    Description

    Justcorners

    ## Overview
    
    Justcorners is a dataset for object detection tasks - it contains Money annotations for 901 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. EDA - Jobs created/retained - 3 year totals

    • performance.commerce.gov
    application/rdfxml +5
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic Development Administration (2025). EDA - Jobs created/retained - 3 year totals [Dataset]. https://performance.commerce.gov/KPI-EDA/EDA-Jobs-created-retained-3-year-totals/t2zs-iubw
    Explore at:
    xml, csv, json, tsv, application/rdfxml, application/rssxmlAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    Economic Development Administrationhttp://www.eda.gov/
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    The formula-driven calculation projects investment data at 3, 6, and 9 year intervals from the investment award. The formula is based on a study done by Rutgers University, which compiled and analyzed the performance of EDA construction investments after 9 years. This approach was reviewed and validated by third-party analysis conducted by Grant Thornton in 2008. Based on this formula and a review of EDA's historical results, EDA estimates that 40% of the 9-year projection would be realized after 3 years, 75% after 6 years, and 100% after 9 years.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SuryaDeepthi (2025). Corporate_work_hours_productivity [Dataset]. https://www.kaggle.com/datasets/suryadeepthi/corporate-work-hours-productivity
Organization logo

Corporate_work_hours_productivity

Analyzing Work Hours, Job Satisfaction & Productivity Trends

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2025
Dataset provided by
Kaggle
Authors
SuryaDeepthi
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset contains 10,000 records of corporate employees across various departments, focusing on work hours, job satisfaction, and productivity performance. The dataset is designed for exploratory data analysis (EDA), performance benchmarking, and predictive modeling of productivity trends.

You can conduct EDA and investigate correlations between work hours, remote work, job satisfaction, and productivity. You can create new metrics like efficiency per hour or impact of meetings on productivity. Machine Learning Model: If you want a predictive task, you can use "Productivity_Score" as a regression target (predicting continuous performance scores). Or you can also create a classification problem (e.g., categorize employees into high, medium, or low productivity).

Search
Clear search
Close search
Google apps
Main menu