16 datasets found
  1. Human Resources Data Set

    • kaggle.com
    zip
    Updated Oct 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Rich (2020). Human Resources Data Set [Dataset]. https://www.kaggle.com/datasets/rhuebner/human-resources-data-set/discussion
    Explore at:
    zip(17041 bytes)Available download formats
    Dataset updated
    Oct 19, 2020
    Authors
    Dr. Rich
    Description

    Updated 30 January 2023

    Version 14 of Dataset

    License Update:

    There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the original authors of this dataset.

    We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing, please follow this license:

    CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

    Codebook

    https://rpubs.com/rhuebner/hrd_cb_v14

    PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were identified between the codebook and the dataset. Please feel free to contact me through LinkedIn (www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.

    Context

    HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business. We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in Tableau Desktop - a data visualization tool that's easy to learn.

    This version provides a variety of features that are useful for both data visualization AND creating machine learning / predictive analytics models. We are working on expanding the data set even further by generating even more records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.

    Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a teaching data set - to teach human resources professionals how to work with data and analytics.

    Content

    We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.

    Recent additions to the data include: - Absences - Most Recent Performance Review Date - Employee Engagement Score

    Acknowledgements

    Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.

    Inspiration

    We've included some open-ended questions that you can explore and try to address through creating Tableau visualizations, or R or Python analyses. Good luck and enjoy the learning!

    • Is there any relationship between who a person works for and their performance score?
    • What is the overall diversity profile of the organization?
    • What are our best recruiting sources if we want to ensure a diverse organization?
    • Can we predict who is going to terminate and who isn't? What level of accuracy can we achieve on this?
    • Are there areas of the company where pay is not equitable?

    There are so many other interesting questions that could be addressed through this interesting data set. Dr. Patalano and I look forward to seeing what we can come up with.

    If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn: http://www.linkedin.com/in/RichHuebner

    You can also reach me via email at: Richard.Huebner@go.cambridgecollege.edu

  2. c

    The global data visualization tools market size is USD 5.9 billion in 2024...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). The global data visualization tools market size is USD 5.9 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 11.6% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/data-visualization-tools-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Feb 8, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global data visualization tools market size is USD 5.9 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 11.6% from 2024 to 2031. Market Dynamics of Data Visualization Tools Market Key Drivers for Data Visualization Tools Market A Growing Focus on Driven by Data Choice-Making- The desire for enterprises to make decisions based on data is a major factor driving the development of global demand for data visualization tools. There is a rising need for tools that can efficiently visualize and interpret the massive volumes of data that companies develop. Users can spot trends, obtain insightful knowledge, and apply statistical analysis to support choices by using data visualization tools. The market for data visualization tools is expanding globally as a result of a growing focus on decision-making in various sectors. Technological advancement also drives the market. Key Restraints for Data Visualization Tools Market The demand for data visualization tools may be negatively impacted by issues with data protection and collaboration. The system's exorbitant cost will also hinder the market's expansion. Introduction of the Data Visualization Tools Market Data visualization is the term for a graphical depiction of details and information. Viewing and understanding oddities, patterns, and trends in data is made simple by visualization programs, which include visual elements like graphical representations and mappings. The technique of making decisions and optimizing operations in the contemporary business environment depends heavily on data visualization technologies. Tools for data visualization are essential for analyzing this data and finding trends and insights. Investment in data visualization tools to achieve a competitive edge by exploiting information resources to produce business value is becoming increasingly important for enterprises as the importance of decisions based on data keeps expanding.

  3. Data Analysis.

    • plos.figshare.com
    xls
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert M. X. Wu; Huan Zhang; Jie Liang; Niusha Shafiabady; Hai Yan (Helen) Lu; Ergun Gide; D. W. M. N. C. Dasanayake; Meena Jha; Shaoyang Duan (2025). Data Analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0321077.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Robert M. X. Wu; Huan Zhang; Jie Liang; Niusha Shafiabady; Hai Yan (Helen) Lu; Ergun Gide; D. W. M. N. C. Dasanayake; Meena Jha; Shaoyang Duan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper proposes a dynamic analytical processing (DAP) visualization tool based on the Bubble-Wall Plot. It can be handily used to develop visual warning systems for visualizing the dynamic analytical processes of hazard data. Comparative analysis and case study methods are used in this research. Based on a literature review of Q1 publications since 2017, 23 types of data visualization approaches/tools are identified, including seven anomaly data visualization tools. This study presents three significant findings by comparing existing data visualization approaches. The primary finding is that no single visualization tool can fully satisfy industry requirements. This finding motivates academics to develop new DAP visualization tools. The second finding is that there are different views of Line Charts and various perspectives on Scatter Plots. The other one is that different researchers may perceive an existing data visualization tool differently, such as arguments between Scatter Plots and Line Charts and diverse opinions about Parallel Coordinate Plots and Scatter Plots. Users’ awareness rises when they choose data visualization tools that satisfy their requirements. By conducting a comparative analysis based on five categories (Style, Value, Change, Correlation, and Others) with 26 subcategories of metric features, results show that this new tool can effectively solve the limitations of existing visualization tools as it appears to have three remarkable characteristics: the simplest cartographic tool, the most straightforward visual result, and the most intuitive tool. Furthermore, this paper illustrates how the Bubble-Wall Plot can be effectively applied to develop a warning system for presenting dynamic analytical processes of hazard data in the coal mine. Lastly, this paper provides two recommendations, one implication, six research limitations, and eleven further study topics.

  4. Synthetic Process Execution Trace

    • kaggle.com
    zip
    Updated May 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asjad K (2022). Synthetic Process Execution Trace [Dataset]. https://www.kaggle.com/datasets/asjad99/process-trace
    Explore at:
    zip(55873943 bytes)Available download formats
    Dataset updated
    May 22, 2022
    Authors
    Asjad K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Background

    Any set of related activities that are executed in a repeatable manner and with a defined goal can be seen as process.

    Process analytic approaches allow organizations to support the practice of Business Process Management and continuous improvement by leveraging all process-related data to extract knowledge, improve process performance and support managerial-decision making across the organization.

    For organisations interested in continuous improvement, such datasets allow data-driven approach for identifying performance bottlenecks, reducing costs, extracting insights and optimizing the utilization of available resources. Understanding the properties of ‘current deployed process’ (whose execution trace is available), is critical to knowing whether it is worth investing in improvements, where performance problems exist, and how much variation there is in the process across the instances and what are the root-causes.

    What is Process Mining (PM) ?

    → process of extracting valuable information from event logs/databases that are generated by processes.

    Two topics are important i) process discovery where a process model describing the control flow is inferred from the data and ii) of conformance checking which deals with verifying that the behavior in the event log adheres to a set of business rules, e.g., defined as a process model. Rhese two use cases focus on the control-flow perspective,

    Why Process Mining ?

    → identifying hidden nodes and bottlenecks in business processes.

    About the Dataset

    A synthetic event log with 100,000 traces and 900,000 events that was generated by simulating a simple artificial process model. There are three data attributes in the event log: Priority, Nurse, and Type. Some paths in the model are recorded infrequently based on the value of these attributes.

    Noise is added by randomly adding one additional event to an increasing number of traces. CPN Tools (http://cpntools.org) was used to generate the event log and inject the noise. The amount of noise can be controlled with the constant 'noise'.

    Smaller dataset:

    The files test0 to test5 represent process traces and maybe used for debugging and sanity check purposes

  5. M

    Management Cockpit Solution Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Management Cockpit Solution Report [Dataset]. https://www.datainsightsmarket.com/reports/management-cockpit-solution-1963848
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Sep 16, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Management Cockpit Solution market is poised for robust growth, projected to reach a substantial market size of approximately $2,500 million by 2025, with an anticipated Compound Annual Growth Rate (CAGR) of 18% through 2033. This expansion is fundamentally driven by the increasing need for sophisticated data visualization and real-time performance monitoring across enterprises of all sizes. Large enterprises, in particular, are actively adopting these solutions to gain deeper insights into complex operational data, enabling them to optimize project execution and proactively identify and rectify potential issues. This trend is further amplified by the growing adoption of digital transformation initiatives, where centralized dashboards and intuitive interfaces become critical for informed decision-making and strategic planning. The surge in data generation across various business functions necessitates powerful tools to distill this information into actionable intelligence, thereby fueling demand for management cockpit solutions. Several key trends are shaping the management cockpit solution landscape. The increasing integration of artificial intelligence (AI) and machine learning (ML) capabilities is a significant differentiator, allowing for predictive analytics, anomaly detection, and automated recommendations. This evolution moves beyond simple data aggregation to proactive management. Furthermore, the demand for customizable and user-friendly interfaces is paramount, catering to diverse user needs within organizations, from C-suite executives to operational managers. Cloud-based solutions are also gaining traction due to their scalability, accessibility, and cost-effectiveness, reducing the burden of on-premise infrastructure. While the market is characterized by strong growth, potential restraints include the high initial implementation costs and the complexity of integrating these solutions with existing legacy systems. However, the overwhelming benefits of enhanced efficiency, improved decision-making, and streamlined operations are expected to outweigh these challenges, ensuring sustained market expansion. This comprehensive report delves into the dynamic Management Cockpit Solution market, analyzing its trajectory from the historical period of 2019-2024 to an estimated 2025, and projecting its evolution through the forecast period of 2025-2033. The study period spans from 2019 to 2033, with a base year set at 2025. We will explore market concentration, key trends, regional dominance, product insights, and the driving forces, challenges, and emerging trends that are shaping this critical technological landscape. The report aims to provide actionable insights for stakeholders, leveraging data and analyses that will be invaluable for strategic decision-making.

  6. f

    Data_Sheet_1_Combining Dendrometer Series and Xylogenesis Imagery—DevX, a...

    • frontiersin.figshare.com
    docx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roberto Cruz-García; Angela Balzano; Katarina Čufar; Tobias Scharnweber; Marko Smiljanić; Martin Wilmking (2023). Data_Sheet_1_Combining Dendrometer Series and Xylogenesis Imagery—DevX, a Simple Visualization Tool to Explore Plant Secondary Growth Phenology.docx [Dataset]. http://doi.org/10.3389/ffgc.2019.00060.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Roberto Cruz-García; Angela Balzano; Katarina Čufar; Tobias Scharnweber; Marko Smiljanić; Martin Wilmking
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Determining the effect of a changing climate on tree growth will ultimately depend on our understanding of wood formation processes and how they can be affected by environmental conditions. In this context, monitoring intra-annual radial growth with high temporal resolution through point dendrometers has often been used. Another widespread approach is the microcoring method to follow xylem and phloem formation at the cellular level. Although both register the same biological process (secondary growth), given the limitations of each method, each delivers specific insights that can be combined to obtain a better picture of the process as a whole. To explore the potential of visualizing combined dendrometer and histological monitoring data and scrutinize intra-annual growth data on both dimensions (dendrometer → continuous; microcoring → discrete), we developed DevX (Dendrometer vs. Xylogenesis), a visualization application using the “Shiny” package in the R programming language. The interactive visualization allows the display of dendrometer curves and the overlay of commonly used growth model fits (Gompertz and Weibull) as well as the calculation of wood phenology estimates based on these fits (growth onset, growth cessation, and duration). Furthermore, the growth curves have interactive points to show the corresponding histological section, where the amount and development stage of the tissues at that particular time point can be observed. This allows to see the agreement of dendrometer derived phenology and the development status at the cellular level, and by this help disentangle shrinkage and swelling due to water uptake from actual radial growth. We present a case study with monitoring data for Acer pseudoplatanus L., Fagus sylvatica L., and Quercus robur L. trees growing in a mixed stand in northeastern Germany. The presented application is an example of the innovative and easy to access use of programming languages as basis for data visualization, and can be further used as a learning tool in the topic of wood formation and its ecology. Combining continuous dendrometer data with the discrete information from histological-sections provides a tool to identify active periods of wood formation from dendrometer series (calibrate) and explore monitoring datasets.

  7. Philippine Health Indicators

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Philippine Health Indicators [Dataset]. https://www.kaggle.com/thedevastator/philippine-health-indicators
    Explore at:
    zip(1250010 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    The Devastator
    Area covered
    Philippines
    Description

    Philippine Health Indicators

    Essential, Neglected, Quick, and All Health Topics

    By Humanitarian Data Exchange [source]

    About this dataset

    This dataset provides comprehensive data on a variety of indicators related to health, medical equipment, and other social determinants found in the Philippines. It contains information from World Health Organization's data portal with insights from Mortality and global health estimates, Sustainable development goals, Millennium Development Goals (MDGs), Health systems, Malaria, Tuberculosis, Child health, Infectious diseases and more. This valuable dataset can be used to explore human behavior in regards to public and environmental health as well as for research into world-wide trends in healthcare access. The data can help inform better policies that protect public safety and improve long-term positive outcomes such as reducing risk factors of disease or increasing resilience against natural disasters. This collection is an invaluable resource for understanding the drivers of healthcare disparities between countries while promoting transparency within governments worldwide

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides information on health indicators in the Philippines. It contains data from the World Health Organization's (WHO) data portal covering a broad range of topics such as mortality and global health estimates, Sustainable Development Goals (SDGs), Millennium Development Goals (MDGs), Health Systems, Malaria, Tuberculosis, Child Health, Infectious Diseases, World Health Statistics and other subtopics. This dataset can be used to generate reports and trends on the wellbeing of Filipinos and health services available in the nation.

    To use this dataset: - Decide which indicator or indicators you want to analyze - You can select specific indicators by filtering with GHO code number or name; DATASOURCE code or name; PUBLISHSTATE code or name; YEAR codes; REGION codes; COUNTRY codes etc., - Once you separate out your chosen indicator(s), it is time to access more detailed information about said indicators. Look for a related URL link provided in the Database: these URLs link to more detailed information such as description of variables used in surveys/studies taken from medical professionals that was used to compile this data set.. The easiest way would be check for URLS associated with particular Indicator Code because each URL should provide specific additional info regarding variable definitions of different criteria related to said Indicator Code selected as well details on methodologies used etc.,
    - Determine how one wants visualize - some will prefer tables while others pictures/graphs/charts which are easy compare visually with little effort possible if using known standard image export software program (+ its present settings whether it be excel sheets , PDF format files etc). You can also customize how one wish visualize wanted results using settings options provided depending on what comparison(s) one wishes look at when it come any visual aids that may help better understand results found its included text fields inside of records too… whatever types visualization wished use – goal attempt gain better understanding questions asked during assortment phases before start analysis phase so once analysis performed “next course” action plan can formulated having seen collected sample sizes been compared against each other.....

    • Finally analyze results found from visualization step- Take note: caution should taken when interpreting given what comparing . Results are not guarantee causal effect between two things just because correlation exists between variables selected outcomes present themselves during duration view presented chart field usage! Make sure look at extra detail fields too during evaluation possible correlations sometimes hidden amongst particular hidden categories within given

    Research Ideas

    • Analyzing the regional differences in medical equipment usage and health outcomes across the Philippines
    • Tracking changes in medical equipment availability over time in different parts of the country
    • Investigating how access to medical equipment varies between specific types of healthcare providers

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    **File: injuries-and-violence-indicators-for-philippines-47.c...

  8. Bank Customer Analysis Done Using Power Bi

    • kaggle.com
    zip
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srividya Uppalur (2023). Bank Customer Analysis Done Using Power Bi [Dataset]. https://www.kaggle.com/datasets/srividyauppalur/bank-customer-analysis-done-using-power-bi
    Explore at:
    zip(83758 bytes)Available download formats
    Dataset updated
    Sep 11, 2023
    Authors
    Srividya Uppalur
    Description

    Bank Data Analysis | Real World Project | Power BI In this Visualization, I have followed the process of analyzing Bank dataset using Microsoft Power BI. I have started by importing the data into Power BI and then i performed the data cleaning, transformation, and visualization on the given data to gain insights and create a comprehensive analysis report.

    Here i have created the insightful visualizations and interactive reports that can be used for business intelligence and decision-making purposes.

    Data Set: Took the support from tutorial by Data Visionary.

    You tube Video referred: https://www.youtube.com/watch?v=GZqBefbNP10&t=1581s

    Analysis done and Visualization shown on: 1. Balance by Age and Gender 2. Number of Customers by Age and Gender 3. Number of Customers by Region 4. Balance by Region 5. Number of Customers by JobType 6. Balance by Gender 7. Total Customers Joined 8. Cards- i) Max Balance by Age ii) Min Balance by Age iii) Max Customers by Gender

    Dear All, Kindly go through the same and please provide me the suggestions and guide me for any changes required and correct me where i need to improve.

  9. a

    QGIS Training Tutorials: Using Spatial Data in Geographic Information...

    • catalogue.arctic-sdi.org
    • datasets.ai
    • +1more
    Updated Oct 28, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). QGIS Training Tutorials: Using Spatial Data in Geographic Information Systems [Dataset]. https://catalogue.arctic-sdi.org/geonetwork/srv/search?format=MOV
    Explore at:
    Dataset updated
    Oct 28, 2019
    Description

    Have you ever wanted to create your own maps, or integrate and visualize spatial datasets to examine changes in trends between locations and over time? Follow along with these training tutorials on QGIS, an open source geographic information system (GIS) and learn key concepts, procedures and skills for performing common GIS tasks – such as creating maps, as well as joining, overlaying and visualizing spatial datasets. These tutorials are geared towards new GIS users. We’ll start with foundational concepts, and build towards more advanced topics throughout – demonstrating how with a few relatively easy steps you can get quite a lot out of GIS. You can then extend these skills to datasets of thematic relevance to you in addressing tasks faced in your day-to-day work.

  10. c

    Ontario Data Catalogue (Ontario Data Catalogue)

    • catalog.civicdataecosystem.org
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Ontario Data Catalogue (Ontario Data Catalogue) [Dataset]. https://catalog.civicdataecosystem.org/dataset/ontario-data-catalogue-ontario-data-catalogue
    Explore at:
    Dataset updated
    Nov 24, 2025
    Area covered
    Ontario
    Description

    AI Generated Summary: The Ontario Data Catalogue is a data portal providing access to open datasets generated and maintained by the Ontario government. It allows users to search, access, visualize, and download data in various machine-readable formats, often through APIs, while also indicating licensing terms and data update frequencies. The catalogue also provides tools for data visualization and notifications for dataset updates. About: The Ontario government generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Digital and Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular ministry, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or

  11. Third PowerBI Dashboard Sales

    • kaggle.com
    zip
    Updated Jan 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FOJLA RABBY (2023). Third PowerBI Dashboard Sales [Dataset]. https://www.kaggle.com/datasets/fojlarabby/third-powerbi-dashboard-sales/data
    Explore at:
    zip(772471 bytes)Available download formats
    Dataset updated
    Jan 20, 2023
    Authors
    FOJLA RABBY
    Description

    Hey guys, This is something I have created in my Free Power Bi Workshop . Simple yet Powerful Dashboard Report of Sales Analysis.

    Topics Covered in Dashboard: 1) Cleaning the Data 2) Analyzing the Data and thinking like the Client 3) Working with simple visualization 4) Working with Advance Visualization 5) Creating Viz From all Data Parameters

    I will keep doing this kind of Workshop where you will learn new and advance technique.

    Project link in github: https://lnkd.in/gQDphxtH

    powerbi #visualization #event #data #tableau #dataanalysis #dashboard #datanalytics #dataanalyst #businessintelligence #powerbideveloper #data #analytics

  12. Pine nut syndrome (PNS) data

    • kaggle.com
    zip
    Updated Feb 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafiz Umair M. Awan (2022). Pine nut syndrome (PNS) data [Dataset]. https://www.kaggle.com/datasets/umaireek/pine-nut-syndrome-data/code
    Explore at:
    zip(2617 bytes)Available download formats
    Dataset updated
    Feb 26, 2022
    Authors
    Hafiz Umair M. Awan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    # Context Pine nuts are non-wood forest products (NWFP) with a constantly growing market notwithstanding a series of phytosanitary issues and related trade problems. Eating pine nuts can occasionally cause some people to experience a bitter or metallic taste lasting from a few days up to 2 weeks. This taste disturbance has been referred to as 'pine mouth' or 'pine nut syndrome'. Most of the studies mentioned in the literature review concern PNS occurrence associated with the nuts of Pinus armandii. China has become a leading exporter of pine nuts, but its export is affected by a symptom caused by the nuts of some pine species: ‘pine nut syndrome’ (PNS). The dataset here contains information of consumers back in 2017 who bought pine nuts from different markets of the world and suffered from pine nut syndrome or PNS.

    # Content countries: This column include names of countries where PNS cases reported markets: The column provides names of markets from where consumers bought pine nuts PNS reported cases: This column gives numerical information about number of PNS cases that are reported.

    # Inspiration Since the file is small, therefore, the main objective is to visualize the data in different ways and to get a hand on experience on simple data visualization. An idea could be: 1. It would be nice to see, for example, if we can attach data as layer on world map and see it as a photo.

    ## Reference The dataset is collected from an interesting review paper paper published back in 2017 by the data creator himself.

  13. World Development Indicators

    • kaggle.com
    • datacatalog.hshsl.umaryland.edu
    • +1more
    Updated May 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillem SD (2024). World Development Indicators [Dataset]. https://www.kaggle.com/datasets/guillemservera/world-development-indicators
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Guillem SD
    License

    https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets

    Description

    Kaggle Dataset Description

    Overview

    This dataset is an Updated and Curated Version of the renowned World Development Indicators dataset by the World Bank. Unlike other Kaggle datasets, this one is up-to-date and comprehensive.

    About the Original Dataset

    The original World Development Indicators dataset is a public resource under the Creative Commons Attribution 4.0 license. It covers a wide array of topics such as Agriculture, Climate Change, Economic Growth, Education, and more. Source

    Included Files

    • WDIdatabase.sqlite: A SQLite database for easier data manipulation.
    • footnotes.csv: Footnotes for data series.
    • series.csv: Metadata for each data series.
    • indicators.csv: Main File. All Development indicators.
    • series_notes.csv: Additional notes for series.
    • country.csv: Country information.
    • country_notes.csv: Country-specific notes.

    Use Cases

    • Global or country-specific economic and social development analysis.
    • Academic research in economics, public health, and social sciences.
    • Data visualization to understand global trends.

    Highlights

    • Up-to-date: Contains the latest available data.
    • Curated: Edited for ease of use, including column name adjustments and data type conversions.

    Photo by Porapak Apichodilok: Link to Photo

  14. Klib library python

    • kaggle.com
    zip
    Updated Jan 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sripaad Srinivasan (2021). Klib library python [Dataset]. https://www.kaggle.com/sripaadsrinivasan/klib-library-python
    Explore at:
    zip(89892446 bytes)Available download formats
    Dataset updated
    Jan 11, 2021
    Authors
    Sripaad Srinivasan
    Description

    klib library enables us to quickly visualize missing data, perform data cleaning, visualize data distribution plot, visualize correlation plot and visualize categorical column values. klib is a Python library for importing, cleaning, analyzing and preprocessing data. Explanations on key functionalities can be found on Medium / TowardsDataScience in the examples section or on YouTube (Data Professor).

    Original Github repo

    https://raw.githubusercontent.com/akanz1/klib/main/examples/images/header.png" alt="klib Header">

    Usage

    !pip install klib
    
    import klib
    import pandas as pd
    
    df = pd.DataFrame(data)
    
    # klib.describe functions for visualizing datasets
    - klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features
    - klib.corr_mat(df) # returns a color-encoded correlation matrix
    - klib.corr_plot(df) # returns a color-encoded heatmap, ideal for correlations
    - klib.dist_plot(df) # returns a distribution plot for every numeric feature
    - klib.missingval_plot(df) # returns a figure containing information about missing values
    

    Examples

    Take a look at this starter notebook.

    Further examples, as well as applications of the functions can be found here.

    Contributing

    Pull requests and ideas, especially for further functions are welcome. For major changes or feedback, please open an issue first to discuss what you would like to change. Take a look at this Github repo.

    License

    MIT

  15. Wikipedia SQLITE Portable DB, Huge 5M+ Rows

    • kaggle.com
    zip
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    christernyc (2024). Wikipedia SQLITE Portable DB, Huge 5M+ Rows [Dataset]. https://www.kaggle.com/datasets/christernyc/wikipedia-sqlite-portable-db-huge-5m-rows/code
    Explore at:
    zip(6064169983 bytes)Available download formats
    Dataset updated
    Jun 29, 2024
    Authors
    christernyc
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The "Wikipedia SQLite Portable DB" is a compact and efficient database derived from the Kensho Derived Wikimedia Dataset (KDWD). This dataset provides a condensed subset of raw Wikimedia data in a format optimized for natural language processing (NLP) research and applications.

    I am not affiliated or partnered with the Kensho in any way, just really like the dataset for giving my agents to query easily.

    Key Features:

    Contains over 5 million rows of data from English Wikipedia and Wikidata Stored in a portable SQLite database format for easy integration and querying Includes a link-annotated corpus of English Wikipedia pages and a compact sample of the Wikidata knowledge base Ideal for NLP tasks, machine learning, data analysis, and research projects

    The database consists of four main tables:

    • items: Contains information about Wikipedia items, including labels and descriptions
    • properties: Stores details about Wikidata properties, such as labels and descriptions
    • pages: Provides metadata for Wikipedia pages, including page IDs, item IDs, titles, and view counts
    • link_annotated_text: Contains the link-annotated text of Wikipedia pages, divided into sections

    This dataset is derived from the Kensho Derived Wikimedia Dataset (KDWD), which is built from the English Wikipedia snapshot from December 1, 2019, and the Wikidata snapshot from December 2, 2019. The KDWD is a condensed subset of the raw Wikimedia data in a form that is helpful for NLP work, and it is released under the CC BY-SA 3.0 license. Credits: The "Wikipedia SQLite Portable DB" is derived from the Kensho Derived Wikimedia Dataset (KDWD), created by the Kensho R&D group. The KDWD is based on data from Wikipedia and Wikidata, which are crowd-sourced projects supported by the Wikimedia Foundation. We would like to acknowledge and thank the Kensho R&D group for their efforts in creating the KDWD and making it available for research and development purposes. By providing this portable SQLite database, we aim to make Wikipedia data more accessible and easier to use for researchers, data scientists, and developers working on NLP tasks, machine learning projects, and other data-driven applications. We hope that this dataset will contribute to the advancement of NLP research and the development of innovative applications utilizing Wikipedia data.

    https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data/data

    Tags: encyclopedia, wikipedia, sqlite, database, reference, knowledge-base, articles, information-retrieval, natural-language-processing, nlp, text-data, large-dataset, multi-table, data-science, machine-learning, research, data-analysis, data-mining, content-analysis, information-extraction, text-mining, text-classification, topic-modeling, language-modeling, question-answering, fact-checking, entity-recognition, named-entity-recognition, link-prediction, graph-analysis, network-analysis, knowledge-graph, ontology, semantic-web, structured-data, unstructured-data, data-integration, data-processing, data-cleaning, data-wrangling, data-visualization, exploratory-data-analysis, eda, corpus, document-collection, open-source, crowdsourced, collaborative, online-encyclopedia, web-data, hyperlinks, categories, page-views, page-links, embeddings

    Usage with LIKE queries: ``` import aiosqlite import asyncio

    class KenshoDatasetQuery: def init(self, db_file): self.db_file = db_file

    async def _aenter_(self):
      self.conn = await aiosqlite.connect(self.db_file)
      return self
    
    async def _aexit_(self, exc_type, exc_val, exc_tb):
      await self.conn.close()
    
    async def search_pages_by_title(self, title):
      query = """
      SELECT pages.page_id, pages.item_id, pages.title, pages.views, 
          items.labels AS item_labels, items.description AS item_description,
          link_annotated_text.sections
      FROM pages 
      JOIN items ON pages.item_id = items.id
      JOIN link_annotated_text ON pages.page_id = link_annotated_text.page_id
      WHERE pages.title LIKE ?
      """
      async with self.conn.execute(query, (f"%{title}%",)) as cursor:
        return await cursor.fetchall()
    
    async def search_items_by_label_or_description(self, keyword):
      query = """
      SELECT id, labels, description 
      FROM items
      WHERE labels LIKE ? OR description LIKE ?
      """
      async with self.conn.execute(query, (f"%{keyword}%", f"%{keyword}%")) as cursor:
        return await cursor.fetchall()
    
    async def search_items_by_label(self, label):
      query = """
      SELECT id, labels, description
      FROM items 
      WHERE labels LIKE ?
      """
      async with self.conn.execute(query, (f"%{label}%",)) as cursor:
        return await cursor.fetchall()
    
    async def search_properties_by_label_or_desc...
    
  16. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dr. Rich (2020). Human Resources Data Set [Dataset]. https://www.kaggle.com/datasets/rhuebner/human-resources-data-set/discussion
Organization logo

Human Resources Data Set

Dataset used for learning data visualization and basic regression

Explore at:
zip(17041 bytes)Available download formats
Dataset updated
Oct 19, 2020
Authors
Dr. Rich
Description

Updated 30 January 2023

Version 14 of Dataset

License Update:

There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the original authors of this dataset.

We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing, please follow this license:

CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Codebook

https://rpubs.com/rhuebner/hrd_cb_v14

PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were identified between the codebook and the dataset. Please feel free to contact me through LinkedIn (www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.

Context

HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business. We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in Tableau Desktop - a data visualization tool that's easy to learn.

This version provides a variety of features that are useful for both data visualization AND creating machine learning / predictive analytics models. We are working on expanding the data set even further by generating even more records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.

Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a teaching data set - to teach human resources professionals how to work with data and analytics.

Content

We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.

Recent additions to the data include: - Absences - Most Recent Performance Review Date - Employee Engagement Score

Acknowledgements

Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.

Inspiration

We've included some open-ended questions that you can explore and try to address through creating Tableau visualizations, or R or Python analyses. Good luck and enjoy the learning!

  • Is there any relationship between who a person works for and their performance score?
  • What is the overall diversity profile of the organization?
  • What are our best recruiting sources if we want to ensure a diverse organization?
  • Can we predict who is going to terminate and who isn't? What level of accuracy can we achieve on this?
  • Are there areas of the company where pay is not equitable?

There are so many other interesting questions that could be addressed through this interesting data set. Dr. Patalano and I look forward to seeing what we can come up with.

If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn: http://www.linkedin.com/in/RichHuebner

You can also reach me via email at: Richard.Huebner@go.cambridgecollege.edu

Search
Clear search
Close search
Google apps
Main menu