100+ datasets found
  1. Top 2500 Kaggle Datasets

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saket Kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

    Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

    Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

    Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

    Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

    Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

    Column Definitions:

    Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

  2. Top uses of data an analytics within companies worldwide 2018

    • statista.com
    Updated Aug 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2018). Top uses of data an analytics within companies worldwide 2018 [Dataset]. https://www.statista.com/statistics/893798/worldwide-data-analytics-top-uses-companies/
    Explore at:
    Dataset updated
    Aug 8, 2018
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2018
    Area covered
    Worldwide
    Description

    This statistic shows the ways that companies are using data and analytics worldwide as of 2018. Around ** percent of respondents stated that one of the top uses of data and analytics in their company was as a driver of strategy and change.

  3. Social Media Analysis

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han Aksoy (2024). Social Media Analysis [Dataset]. https://www.kaggle.com/datasets/hanaksoy/social-media-analysis
    Explore at:
    zip(726 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Han Aksoy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    hello everyone. I wanted to share here my data set, which is completely my own product of imagination and has no connection with the real data that I wrote for my own data analysis training, for analysts working on data analysis. have fun:))))))

    UserId: Unique identifier for each user in the data set

    UsageDuration: Total time spent by the user on social media in hours

    Age: Age of the user in years

    Country: Country of residence of the user

    TotalLikes: Total number of likes giving by the user in a day

  4. Data Analyst Job Postings

    • kaggle.com
    zip
    Updated Jun 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    asaniczka (2024). Data Analyst Job Postings [Dataset]. https://www.kaggle.com/datasets/asaniczka/data-analyst-job-postings
    Explore at:
    zip(17347829 bytes)Available download formats
    Dataset updated
    Jun 20, 2024
    Authors
    asaniczka
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Data science is a rapidly growing field in the tech industry, and LinkedIn is a popular platform for finding job opportunities in this domain.

    This dataset provides valuable insights into data analyst job postings, including the required skills and software proficiency sought by employers.

    If you find this dataset useful, don't forget to hit the upvote button! 😊💝

    Checkout my top datasets

    Interesting Task Ideas:

    1. Analyze the most in-demand skills and software for data analyst positions.
    2. Find common job titles in the data science field.
    3. Explore the geographical distribution of data engineering job opportunities.
    4. Identify the most sought-after skills

    Photo by Lukas Blazek on Unsplash

  5. d

    Pump.Fun Data Analytics

    • dune.com
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dataexamples (2025). Pump.Fun Data Analytics [Dataset]. https://dune.com/discover/content/relevant?q=author:dataexamples&resource-type=dashboards
    Explore at:
    Dataset updated
    Jan 2, 2025
    Authors
    dataexamples
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data dashboard: Pump.Fun Data Analytics

  6. Powerful Data for Power BI

    • kaggle.com
    zip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiv_D24Coder (2023). Powerful Data for Power BI [Dataset]. https://www.kaggle.com/datasets/shivd24coder/powerful-data-for-power-bi
    Explore at:
    zip(907404 bytes)Available download formats
    Dataset updated
    Aug 28, 2023
    Authors
    Shiv_D24Coder
    Description

    Explore the world of data visualization with this Power BI dataset containing HR Analytics and Sales Analytics datasets. Gain insights, create impactful reports, and craft engaging dashboards using real-world data from HR and sales domains. Sharpen your Power BI skills and uncover valuable data-driven insights with this powerful dataset. Happy analyzing!

  7. H

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jamie Monogan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  8. D

    Data Lens (Visualizations Of Data) Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Lens (Visualizations Of Data) Report [Dataset]. https://www.archivemarketresearch.com/reports/data-lens-visualizations-of-data-48718
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the explosive growth of the data visualization market, projected at a 15% CAGR to reach $153 billion by 2033. This in-depth analysis reveals key trends, leading companies like Tableau and Sisense, and regional market breakdowns. Learn how data visualization is transforming business intelligence.

  9. Could a Neuroscientist Understand a Microprocessor?

    • plos.figshare.com
    mp4
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Jonas; Konrad Paul Kording (2023). Could a Neuroscientist Understand a Microprocessor? [Dataset]. http://doi.org/10.1371/journal.pcbi.1005268
    Explore at:
    mp4Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Eric Jonas; Konrad Paul Kording
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There is a popular belief in neuroscience that we are primarily data limited, and that producing large, multimodal, and complex datasets will, with the help of advanced data analysis algorithms, lead to fundamental insights into the way the brain processes information. These datasets do not yet exist, and if they did we would have no way of evaluating whether or not the algorithmically-generated insights were sufficient or even correct. To address this, here we take a classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information. Microprocessors are among those artificial information processing systems that are both complex and that we understand at all levels, from the overall logical flow, via logical gates, to the dynamics of transistors. We show that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the microprocessor. This suggests current analytic approaches in neuroscience may fall short of producing meaningful understanding of neural systems, regardless of the amount of data. Additionally, we argue for scientists using complex non-linear dynamical systems with known ground truth, such as the microprocessor as a validation platform for time-series and structure discovery methods.

  10. Additional file 3 of CC-PROMISE effectively integrates two forms of...

    • springernature.figshare.com
    • figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xueyuan Cao; Kristine Crews; James Downing; Jatinder Lamba; Stanley Pounds (2023). Additional file 3 of CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints [Dataset]. http://doi.org/10.6084/m9.figshare.c.3621308_D3.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Xueyuan Cao; Kristine Crews; James Downing; Jatinder Lamba; Stanley Pounds
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each row of this table provides power estimates for each method in one simulated setting. The columns labeled setting, n, beta.y, beta.x, and beta.m provide the setting index number, the sample size n, and the values of the parameters β y , β x , and β m , respectively. (XLSX 41 kb)

  11. V

    Visual Analytics Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Visual Analytics Market Report [Dataset]. https://www.promarketreports.com/reports/visual-analytics-market-8876
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Oct 23, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Key Types of Visual Analytics Tools and Capabilities: Interactive Dashboards and Reporting: These are foundational tools that provide real-time visibility into key performance indicators (KPIs), operational metrics, and business performance. They allow users to filter, drill down, and interact with data to uncover trends and anomalies. Geospatial Analytics: This category enables the analysis and visualization of data with a geographical component. It is crucial for applications like site selection, market analysis, logistics optimization, urban planning, and demographic studies, allowing users to understand spatial patterns and relationships. Predictive and Prescriptive Analytics Tools: These advanced tools leverage statistical algorithms and machine learning to forecast future trends, identify potential risks, and recommend optimal courses of action. They help organizations move from understanding "what happened" to predicting "what will happen" and prescribing "what should be done." Network Analysis: Visualizations that reveal relationships and connections between entities, crucial for understanding social networks, supply chains, fraud rings, and cybersecurity threats. Natural Language Processing (NLP) and Generation (NLG) Integration: Tools that allow users to query data using natural language and automatically generate narratives or summaries from data visualizations, making analytics more accessible. Augmented Analytics: AI-powered features that guide users through the analytics process, automatically identify interesting insights, and suggest relevant visualizations, democratizing data analysis. Recent developments include: February 2021: RapidMiner Inc. partnered with Hivecell to allow users to rapidly generate and operate models with streaming data from the edge., December 2021: Microsoft Corporation announced a collaboration with Consumer Value Store (CVS) Health to develop innovative solutions to help consumers improve their health.. Key drivers for this market are: Need for real-time and interactive data visualization Increasing complexity and volume of data Demand for actionable insights and decision-making support Growing adoption of AI and ML in analytics. Potential restraints include: Data integration and security concerns Limited availability of skilled professionals High cost of deployment and maintenance. Notable trends are: Self-service analytics for non-technical users Integration with collaboration platforms for data sharing Adoption of augmented reality (AR) and virtual reality (VR) for immersive data visualization.

  12. Considerations for analyzing EMA data (Oleson et al., 2021)

    • asha.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob J. Oleson; Michelle A. Jones; Erik J. Jorgensen; Yu-Hsiang Wu (2023). Considerations for analyzing EMA data (Oleson et al., 2021) [Dataset]. http://doi.org/10.23641/asha.17155961.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    American Speech–Language–Hearing Associationhttps://www.asha.org/
    Authors
    Jacob J. Oleson; Michelle A. Jones; Erik J. Jorgensen; Yu-Hsiang Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose: The analysis of Ecological Momentary Assessment (EMA) data can be difficult to conceptualize due to the complexity of how the data are collected. The goal of this tutorial is to provide an overview of statistical considerations for analyzing observational data arising from EMA studies.Method: EMA data are collected in a variety of ways, complicating the statistical analysis. We focus on fundamental statistical characteristics of the data and general purpose statistical approaches to analyzing EMA data. We implement those statistical approaches using a recent study involving EMA.Results: The linear or generalized linear mixed-model statistical approach can adequately capture the challenges resulting from EMA collected data if properly set up. Additionally, while sample size depends on both the number of participants and the number of survey responses per participant, having more participants is more important than the number of responses per participant.Conclusion: Using modern statistical methods when analyzing EMA data and adequately considering all of the statistical assumptions being used can lead to interesting and important findings when using EMA.Supplemental Material S1. Power for given effect sizes, number of participants, and number of surveys per individual for a two independent groups comparison.Supplemental Material S2. Power for given effect sizes, number of participants, and number of surveys per individual for a paired groups comparison.Oleson, J. J., Jones, M. A., Jorgensen, E. J., & Wu, Y.-H. (2021). Statistical considerations for analyzing Ecological Momentary Assessment data. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2021_JSLHR-21-00081

  13. t

    The Minimum Size Seed Round to Maximize Series A Follow On Investment - Data...

    • tomtunguz.com
    Updated Feb 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Tunguz (2014). The Minimum Size Seed Round to Maximize Series A Follow On Investment - Data Analysis [Dataset]. https://tomtunguz.com/seed-followon-rates/
    Explore at:
    Dataset updated
    Feb 6, 2014
    Dataset provided by
    Theory Ventures
    Authors
    Tomasz Tunguz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data analysis reveals optimal seed round size: $600-900k increases Series A success rate by 33%. Key insights for startup founders on maximizing follow-on funding.

  14. d

    Data from: Debate: Subgroup analyses in clinical trials: fun to look at -...

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Debate: Subgroup analyses in clinical trials: fun to look at - but don't believe them! [Dataset]. https://catalog.data.gov/dataset/debate-subgroup-analyses-in-clinical-trials-fun-to-look-at-but-dont-believe-them
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Analysis of subgroup results in a clinical trial is surprisingly unreliable, even in a large trial. This is the result of a combination of reduced statistical power, increased variance and the play of chance. Reliance on such analyses is likely to be more erroneous, and hence harmful, than application of the overall proportional (or relative) result in the whole trial to the estimate of absolute risk in that subgroup. Plausible explanations can usually be found for effects that are, in reality, simply due to the play of chance. When clinicians believe such subgroup analyses, there is a real danger of harm to the individual patient.

  15. f

    Data from: Causal Interpretations of Black-Box Models

    • tandf.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingyuan Zhao; Trevor Hastie (2023). Causal Interpretations of Black-Box Models [Dataset]. http://doi.org/10.6084/m9.figshare.8218763.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Qingyuan Zhao; Trevor Hastie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The fields of machine learning and causal inference have developed many concepts, tools, and theory that are potentially useful for each other. Through exploring the possibility of extracting causal interpretations from black-box machine-trained models, we briefly review the languages and concepts in causal inference that may be interesting to machine learning researchers. We start with the curious observation that Friedman’s partial dependence plot has exactly the same formula as Pearl’s back-door adjustment and discuss three requirements to make causal interpretations: a model with good predictive performance, some domain knowledge in the form of a causal diagram and suitable visualization tools. We provide several illustrative examples and find some interesting and potentially causal relations using visualization tools for black-box models.

  16. Empirical overall power of the CTOT, MC, and CO methods with analysis on...

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Zhuang; LuĂ­sa Camacho; Camila S. Silva; Michael Thomson; Kevin Snyder (2023). Empirical overall power of the CTOT, MC, and CO methods with analysis on benchmark data. [Dataset]. http://doi.org/10.1371/journal.pone.0263070.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wei Zhuang; LuĂ­sa Camacho; Camila S. Silva; Michael Thomson; Kevin Snyder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BFD stands for the benchmark with full data analyzed with the current standard method, which includes t-tests for two-group comparisons. CTOT stands for the cycle-to-threshold method, while CO denotes the complete-observation method and MC denotes the method that sets uncertain observations equal to the assay-specific maximum cycle threshold C1.

  17. d

    Data from: Permutation-validated principal components analysis of microarray...

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Permutation-validated principal components analysis of microarray data [Dataset]. https://catalog.data.gov/dataset/permutation-validated-principal-components-analysis-of-microarray-data
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure. Results We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes. Conclusions Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.

  18. Human Resources Data Set

    • kaggle.com
    zip
    Updated Oct 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Rich (2020). Human Resources Data Set [Dataset]. https://www.kaggle.com/datasets/rhuebner/human-resources-data-set/discussion
    Explore at:
    zip(17041 bytes)Available download formats
    Dataset updated
    Oct 19, 2020
    Authors
    Dr. Rich
    Description

    Updated 30 January 2023

    Version 14 of Dataset

    License Update:

    There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the original authors of this dataset.

    We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing, please follow this license:

    CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

    Codebook

    https://rpubs.com/rhuebner/hrd_cb_v14

    PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were identified between the codebook and the dataset. Please feel free to contact me through LinkedIn (www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.

    Context

    HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business. We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in Tableau Desktop - a data visualization tool that's easy to learn.

    This version provides a variety of features that are useful for both data visualization AND creating machine learning / predictive analytics models. We are working on expanding the data set even further by generating even more records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.

    Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a teaching data set - to teach human resources professionals how to work with data and analytics.

    Content

    We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.

    Recent additions to the data include: - Absences - Most Recent Performance Review Date - Employee Engagement Score

    Acknowledgements

    Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.

    Inspiration

    We've included some open-ended questions that you can explore and try to address through creating Tableau visualizations, or R or Python analyses. Good luck and enjoy the learning!

    • Is there any relationship between who a person works for and their performance score?
    • What is the overall diversity profile of the organization?
    • What are our best recruiting sources if we want to ensure a diverse organization?
    • Can we predict who is going to terminate and who isn't? What level of accuracy can we achieve on this?
    • Are there areas of the company where pay is not equitable?

    There are so many other interesting questions that could be addressed through this interesting data set. Dr. Patalano and I look forward to seeing what we can come up with.

    If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn: http://www.linkedin.com/in/RichHuebner

    You can also reach me via email at: Richard.Huebner@go.cambridgecollege.edu

  19. A tabular summary of power analysis of two-sided t-test for two independent...

    • plos.figshare.com
    txt
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Zhuang; LuĂ­sa Camacho; Camila S. Silva; Michael Thomson; Kevin Snyder (2023). A tabular summary of power analysis of two-sided t-test for two independent groups. [Dataset]. http://doi.org/10.1371/journal.pone.0263070.s004
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wei Zhuang; LuĂ­sa Camacho; Camila S. Silva; Michael Thomson; Kevin Snyder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The sample size is 5 in each group. The population mean mu1 varies from 20 to 30 in Group 1, while the population mean mu2 varies from 32 to 40 in Group 2. For simplicity, the population standard deviation is fixed to be 1 in each group. The significance levels (alpha) are set to be 0.001, 0.005, 0.05, or 0.2 for analytical illustration. (CSV)

  20. Methods to analyze qPCR data with uncertain observations.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Zhuang; LuĂ­sa Camacho; Camila S. Silva; Michael Thomson; Kevin Snyder (2023). Methods to analyze qPCR data with uncertain observations. [Dataset]. http://doi.org/10.1371/journal.pone.0263070.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wei Zhuang; LuĂ­sa Camacho; Camila S. Silva; Michael Thomson; Kevin Snyder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Methods to analyze qPCR data with uncertain observations.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
Organization logo

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saket Kumar
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

Search
Clear search
Close search
Google apps
Main menu