100+ datasets found
  1. Data Analysis in R

    • kaggle.com
    zip
    Updated May 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajdeep Kaur Bajwa (2022). Data Analysis in R [Dataset]. https://www.kaggle.com/datasets/rajdeepkaurbajwa/data-analysis-r
    Explore at:
    zip(5321 bytes)Available download formats
    Dataset updated
    May 16, 2022
    Authors
    Rajdeep Kaur Bajwa
    Description

    Dataset

    This dataset was created by Rajdeep Kaur Bajwa

    Contents

  2. Cyclistic_bike _share_analysis_case_study

    • kaggle.com
    zip
    Updated Oct 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranjith@0073 (2025). Cyclistic_bike _share_analysis_case_study [Dataset]. https://www.kaggle.com/datasets/ranjith0073/cyclistic-bike-share-analysis-case-study
    Explore at:
    zip(585776 bytes)Available download formats
    Dataset updated
    Oct 16, 2025
    Authors
    Ranjith@0073
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ๐Ÿ“Š Full Dataset
    The complete cleaned dataset used in this analysis is available for download (123 MB). A smaller sample is included in this repository for quick testing.

    ๐Ÿ“‚ Project Overview
    This project analyzes Cyclistic bike-share data to uncover ride patterns, user behavior, and station popularity.
    It includes data cleaning, exploratory data analysis (EDA), and visualizations using R (tidyverse, ggplot2, lubridate).

    ๐Ÿ“ˆ Key Visualizations
    - Rides by User Type
    - Rides per Day of the Week
    - Ride Duration Distribution
    - Rides by Bike Type
    - Top 10 Start Stations
    (All visualizations are stored in the plots/ folder.)

    ๐Ÿง  Key Insights
    - Subscribers ride more frequently than casual users.
    - Weekdays show higher ride volumes.
    - Most trips last under 30 minutes.
    - Top stations are concentrated in central business and tourist areas.

    ๐Ÿ› ๏ธ Tools Used
    - R
    - tidyverse
    - ggplot2
    - lubridate

    ๐Ÿ“ˆ Project by: Ranjithkumar R.K

  3. f

    R-script to Analyse Data

    • uvaauas.figshare.com
    txt
    Updated Apr 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T. Blanke (2022). R-script to Analyse Data [Dataset]. http://doi.org/10.21942/uva.14346842.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 4, 2022
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    T. Blanke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exploratory data analysis and visualisation of datasets

  4. Additional file 1 of Simple but powerful interactive data analysis in R with...

    • springernature.figshare.com
    zip
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Svetlana Ovchinnikova; Simon Anders (2024). Additional file 1 of Simple but powerful interactive data analysis in R with R/LinkedCharts [Dataset]. http://doi.org/10.6084/m9.figshare.26677037.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Svetlana Ovchinnikova; Simon Anders
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. Zip file containing the interactive supplement.

  5. E

    Exploratory Data Analysis (EDA) Tools Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming Exploratory Data Analysis (EDA) tools market! Our in-depth analysis reveals key trends, growth drivers, and top players shaping this $3 billion industry, projected for 15% CAGR through 2033. Learn about market segmentation, regional insights, and future opportunities.

  6. Analysis of small businesses in Michigan

    • kaggle.com
    zip
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maooz Abdullah (2024). Analysis of small businesses in Michigan [Dataset]. https://www.kaggle.com/datasets/maoozabdullah/analysis-of-small-businesses-in-michigan
    Explore at:
    zip(334456 bytes)Available download formats
    Dataset updated
    Oct 12, 2024
    Authors
    Maooz Abdullah
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Michigan
    Description

    The objective of this report is to analyze the role of small businesses in the Michigan job market using the provided dataset. We aim to understand the impact of small businesses on employment, sales, and other economic factors. This analysis will help in identifying trends and patterns that can inform policy decisions and support for small businesses.

  7. H

    Physical Properties of Lakes: Exploratory Data Analysis

    • hydroshare.org
    • search.dataone.org
    zip
    Updated Jan 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriela Garcia; Kateri Salk (2021). Physical Properties of Lakes: Exploratory Data Analysis [Dataset]. https://www.hydroshare.org/resource/42052357655f4ad39f8ec7d0bef351c7
    Explore at:
    zip(2.0 MB)Available download formats
    Dataset updated
    Jan 29, 2021
    Dataset provided by
    HydroShare
    Authors
    Gabriela Garcia; Kateri Salk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 27, 1984 - Aug 17, 2016
    Area covered
    Description

    Exploratory Data Analysis for the Physical Properties of Lakes

    This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on the physical properties of lakes.

    Introduction

    Lakes are dynamic, nonuniform bodies of water in which the physical, biological, and chemical properties interact. Lakes also contain the majority of Earth's fresh water supply. This lesson introduces exploratory data analysis using R statistical software in the context of the physical properties of lakes.

    Learning Objectives

    After successfully completing this exercise, you will be able to:

    1. Apply exploratory data analytics skills to applied questions about physical properties of lakes
    2. Communicate findings with peers through oral, visual, and written modes
  8. Data accompanying the seuFLViz R package for interactive exploratory data...

    • zenodo.org
    bin
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominic Shayler; Dominic Shayler; Kevin Stachelek; Kevin Stachelek; David Cobrinik; David Cobrinik (2025). Data accompanying the seuFLViz R package for interactive exploratory data analysis of single cell datasets as seurat objects [Dataset]. http://doi.org/10.5281/zenodo.15596099
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dominic Shayler; Dominic Shayler; Kevin Stachelek; Kevin Stachelek; David Cobrinik; David Cobrinik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data accompanying the seuFLViz R package for interactive exploratory data analysis of single cell datasets as seurat objects.

    Data collected by Dominic Shayler and described in:

    1. Shayler DW, Stachelek K, Cambier L, Lee S, Bai J, Reid MW, Weisenberger DJ, Bhat B, Aparicio JG, Kim Y, Singh M, Bay M, Thornton ME, Doyle EK, Fouladian Z, Erberich SG, Grubbs BH, Bonaguidi MA, Craft CM, Singh HP, Cobrinik D. Identification and characterization of early human photoreceptor states and cell-state-specific retinoblastoma-related features. eLife [Internet]. eLife Sciences Publications Limited; 2024 Nov 22 [cited 2024 Dec 20];13.
    Some raw data available in GEO: GSE207802
  9. Data from: Superheat: An R Package for Creating Beautiful and Extendable...

    • tandf.figshare.com
    bin
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca L. Barter; Bin Yu (2024). Superheat: An R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data [Dataset]. http://doi.org/10.6084/m9.figshare.6287693.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Rebecca L. Barter; Bin Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This article introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this article is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources.

  10. n

    HadISD: Global sub-daily, surface meteorological station data, 1931-2023,...

    • data-search.nerc.ac.uk
    Updated Jul 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). HadISD: Global sub-daily, surface meteorological station data, 1931-2023, v3.4.0.2023f [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=dewpoint
    Explore at:
    Dataset updated
    Jul 24, 2021
    Description

    This is version v3.4.0.2023f of Met Office Hadley Centre's Integrated Surface Database, HadISD. These data are global sub-daily surface meteorological data. This update (v3.4.0.2023f) to HadISD corrects a long-standing bug which was discovered in autumn 2023 whereby the neighbour checks (and associated [un]flagging for some other tests) were not being implemented. For more details see the posts on the HadISD blog: https://hadisd.blogspot.com/2023/10/bug-in-buddy-checks.html & https://hadisd.blogspot.com/2024/01/hadisd-v3402023f-future-look.html The quality controlled variables in this dataset are: temperature, dewpoint temperature, sea-level pressure, wind speed and direction, cloud data (total, low, mid and high level). Past significant weather and precipitation data are also included, but have not been quality controlled, so their quality and completeness cannot be guaranteed. Quality control flags and data values which have been removed during the quality control process are provided in the qc_flags and flagged_values fields, and ancillary data files show the station listing with a station listing with IDs, names and location information. The data are provided as one NetCDF file per station. Files in the station_data folder station data files have the format "station_code"_HadISD_HadOBS_19310101-20240101_v3.4.1.2023f.nc. The station codes can be found under the docs tab. The station codes file has five columns as follows: 1) station code, 2) station name 3) station latitude 4) station longitude 5) station height. To keep informed about updates, news and announcements follow the HadOBS team on twitter @metofficeHadOBS. For more detailed information e.g bug fixes, routine updates and other exploratory analysis, see the HadISD blog: http://hadisd.blogspot.co.uk/ References: When using the dataset in a paper you must cite the following papers (see Docs for link to the publications) and this dataset (using the "citable as" reference) : Dunn, R. J. H., (2019), HadISD version 3: monthly updates, Hadley Centre Technical Note. Dunn, R. J. H., Willett, K. M., Parker, D. E., and Mitchell, L.: Expanding HadISD: quality-controlled, sub-daily station data from 1931, Geosci. Instrum. Method. Data Syst., 5, 473-491, doi:10.5194/gi-5-473-2016, 2016. Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Clim. Past, 8, 1649-1679, 2012, doi:10.5194/cp-8-1649-2012 Smith, A., N. Lott, and R. Vose, 2011: The Integrated Surface Database: Recent Developments and Partnerships. Bulletin of the American Meteorological Society, 92, 704โ€“708, doi:10.1175/2011BAMS3015.1 For a homogeneity assessment of HadISD please see this following reference Dunn, R. J. H., K. M. Willett, C. P. Morice, and D. E. Parker. "Pairwise homogeneity assessment of HadISD." Climate of the Past 10, no. 4 (2014): 1501-1522. doi:10.5194/cp-10-1501-2014, 2014.

  11. f

    Data from: ftmsRanalysis: An R package for exploratory data analysis and...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bramer, Lisa M.; Claborne, Daniel; Stratton, Kelly G.; Hofmockel, Kirsten; Thompson, Allison M.; McCue, Lee Ann; White, Amanda M. (2020). ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000452686
    Explore at:
    Dataset updated
    Mar 16, 2020
    Authors
    Bramer, Lisa M.; Claborne, Daniel; Stratton, Kelly G.; Hofmockel, Kirsten; Thompson, Allison M.; McCue, Lee Ann; White, Amanda M.
    Description

    The high-resolution and mass accuracy of Fourier transform mass spectrometry (FT-MS) has made it an increasingly popular technique for discerning the composition of soil, plant and aquatic samples containing complex mixtures of proteins, carbohydrates, lipids, lignins, hydrocarbons, phytochemicals and other compounds. Thus, there is a growing demand for informatics tools to analyze FT-MS data that will aid investigators seeking to understand the availability of carbon compounds to biotic and abiotic oxidation and to compare fundamental chemical properties of complex samples across groups. We present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities. The package provides a suite of plotting methods and enables expedient, flexible and interactive visualization of complex datasets through functions which link to a powerful and interactive visualization user interface, Trelliscope. Example analysis using FT-MS data from a soil microbiology study demonstrates the core functionality of the package and highlights the capabilities for producing interactive visualizations.

  12. n

    HadISD: Global sub-daily, surface meteorological station data, 1931-2017,...

    • data-search.nerc.ac.uk
    • catalogue.ceda.ac.uk
    Updated Jul 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). HadISD: Global sub-daily, surface meteorological station data, 1931-2017, v2.0.2.2017f [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=dewpoint
    Explore at:
    Dataset updated
    Jul 24, 2021
    Description

    This is version 2.0.2.2017f of Met Office Hadley Centre's Integrated Surface Database, HadISD. These data are global sub-daily surface meteorological data that extends HadISD v2.0.1.2016p to include 2017 and so spans 1931-2017, it replaces the preliminary version (v2.0.2.2017p) as the ISD data for 2017 are now finalised. The quality controlled variables in this dataset are: temperature, dewpoint temperature, sea-level pressure, wind speed and direction, cloud data (total, low, mid and high level). Past significant weather and precipitation data are also included, but have not been quality controlled, so their quality and completeness cannot be guaranteed. Quality control flags and data values which have been removed during the quality control process are provided in the qc_flags and flagged_values fields, and ancillary data files show the station listing with a station listing with IDs, names and location information. The data are provided as one NetCDF file per station. Files in the station_data folder station data files have the format "station_code"_HadISD_HadOBS_19310101-20171231_v2-0-2-2017f.nc. The station codes can be found under the docs tab or on the archive beside the station_data folder. The station codes file has five columns as follows: 1) station code, 2) station name 3) station latitude 4) station longitude 5) station height. To keep informed about updates, news and announcements follow the HadOBS team on twitter @metofficeHadOBS. For more detailed information e.g bug fixes, routine updates and other exploratory analysis, see the HadISD blog: http://hadisd.blogspot.co.uk/ For a more detailed description of precipitation see: http://hadisd.blogspot.co.uk/2018/03/precipitation-in-hadisd.html References: When using the dataset in a paper you must cite the following papers (see Docs for link to the publications) and this dataset (using the "citable as" reference) : Dunn, R. J. H., Willett, K. M., Parker, D. E., and Mitchell, L.: Expanding HadISD: quality-controlled, sub-daily station data from 1931, Geosci. Instrum. Method. Data Syst., 5, 473-491, doi:10.5194/gi-5-473-2016, 2016. Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Clim. Past, 8, 1649-1679, 2012, doi:10.5194/cp-8-1649-2012 Smith, A., N. Lott, and R. Vose, 2011: The Integrated Surface Database: Recent Developments and Partnerships. Bulletin of the American Meteorological Society, 92, 704โ€“708, doi:10.1175/2011BAMS3015.1 For a homogeneity assessment of HadISD please see this following reference Dunn, R. J. H., K. M. Willett, C. P. Morice, and D. E. Parker. "Pairwise homogeneity assessment of HadISD." Climate of the Past 10, no. 4 (2014): 1501-1522. doi:10.5194/cp-10-1501-2014, 2014.

  13. 1.76 million r/AmItheAsshole submissions

    • kaggle.com
    zip
    Updated Apr 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Persaud (2023). 1.76 million r/AmItheAsshole submissions [Dataset]. https://www.kaggle.com/datasets/noahpersaud/176-million-ramitheasshole-submissions
    Explore at:
    zip(386075268 bytes)Available download formats
    Dataset updated
    Apr 2, 2023
    Authors
    Noah Persaud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Noah Persaud

    Released under Attribution 4.0 International (CC BY 4.0)

    Contents

  14. r

    Exploratory data analysis of infrared spectra from 3D-printing polymers

    • researchdata.edu.au
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Lewis; Michael V. Adamos; Kari Pitts; Georgina Sauzier (2025). Exploratory data analysis of infrared spectra from 3D-printing polymers [Dataset]. http://doi.org/10.25917/FN6A-AZ80
    Explore at:
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    Curtin University
    Authors
    Simon Lewis; Michael V. Adamos; Kari Pitts; Georgina Sauzier
    Description

    Data description: This dataset consists of spectroscopic data files and associated R-scripts for exploratory data analysis. Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectra were collected from 67 samples of polymer filaments potentially used to produce illicit 3D-printed items. Principal component analysis (PCA) was used to determine if any individual filaments gave distinctive spectral signatures, potentially allowing traceability of 3D-printed items for forensic purposes. The project also investigated potential chemical variations induced by the filament manufacturing or 3D-printing process. Data was collected and analysed by Michael Adamos at Curtin University (Perth, Western Australia), under the supervision of Dr Georgina Sauzier and Prof. Simon Lewis and with specialist input from Dr Kari Pitts.

    Data collection time details: 2024
    Number of files/types: 3 .R files, 702 .JDX files
    Geographic information (if relevant): Australia
    Keywords: 3D printing, polymers, infrared spectroscopy, forensic science

  15. Data from: Penguins Go Parallel: A Grammar of Graphics Framework for...

    • tandf.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susan VanderPlas; Yawei Ge; Antony Unwin; Heike Hofmann (2023). Penguins Go Parallel: A Grammar of Graphics Framework for Generalized Parallel Coordinate Plots [Dataset]. http://doi.org/10.6084/m9.figshare.22467369.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Susan VanderPlas; Yawei Ge; Antony Unwin; Heike Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parallel Coordinate Plots (PCP) are a valuable tool for exploratory data analysis of high-dimensional numerical data. The use of PCPs is limited when working with categorical variables or a mix of categorical and continuous variables. In this article, we propose Generalized Parallel Coordinate Plots (GPCP) to extend the ability of PCPs from just numeric variables to dealing seamlessly with a mix of categorical and numeric variables in a single plot. In this process we find that existing solutions for categorical values only, such as hammock plots or parsets become edge cases in the new framework. By focusing on individual observations rather than a marginal frequency we gain additional flexibility. The resulting approach is implemented in the R package ggpcp. Supplementary materials for this article are available online.

  16. n

    HadISD: Global sub-daily, surface meteorological station data, 1931-2022,...

    • data-search.nerc.ac.uk
    • catalogue.ceda.ac.uk
    Updated Jul 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). HadISD: Global sub-daily, surface meteorological station data, 1931-2022, v3.3.0.2022f [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=dewpoint
    Explore at:
    Dataset updated
    Jul 24, 2021
    Description

    This is version v3.3.0.2022f of Met Office Hadley Centre's Integrated Surface Database, HadISD. These data are global sub-daily surface meteorological data. The quality controlled variables in this dataset are: temperature, dewpoint temperature, sea-level pressure, wind speed and direction, cloud data (total, low, mid and high level). Past significant weather and precipitation data are also included, but have not been quality controlled, so their quality and completeness cannot be guaranteed. Quality control flags and data values which have been removed during the quality control process are provided in the qc_flags and flagged_values fields, and ancillary data files show the station listing with a station listing with IDs, names and location information. The data are provided as one NetCDF file per station. Files in the station_data folder station data files have the format "station_code"_HadISD_HadOBS_19310101-20230101_v3.3.1.2022f.nc. The station codes can be found under the docs tab. The station codes file has five columns as follows: 1) station code, 2) station name 3) station latitude 4) station longitude 5) station height. To keep informed about updates, news and announcements follow the HadOBS team on twitter @metofficeHadOBS. For more detailed information e.g bug fixes, routine updates and other exploratory analysis, see the HadISD blog: http://hadisd.blogspot.co.uk/ References: When using the dataset in a paper you must cite the following papers (see Docs for link to the publications) and this dataset (using the "citable as" reference) : Dunn, R. J. H., (2019), HadISD version 3: monthly updates, Hadley Centre Technical Note. Dunn, R. J. H., Willett, K. M., Parker, D. E., and Mitchell, L.: Expanding HadISD: quality-controlled, sub-daily station data from 1931, Geosci. Instrum. Method. Data Syst., 5, 473-491, doi:10.5194/gi-5-473-2016, 2016. Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Clim. Past, 8, 1649-1679, 2012, doi:10.5194/cp-8-1649-2012 Smith, A., N. Lott, and R. Vose, 2011: The Integrated Surface Database: Recent Developments and Partnerships. Bulletin of the American Meteorological Society, 92, 704โ€“708, doi:10.1175/2011BAMS3015.1 For a homogeneity assessment of HadISD please see this following reference Dunn, R. J. H., K. M. Willett, C. P. Morice, and D. E. Parker. "Pairwise homogeneity assessment of HadISD." Climate of the Past 10, no. 4 (2014): 1501-1522. doi:10.5194/cp-10-1501-2014, 2014.

  17. climwin: An R Toolbox for Climate Window Analysis

    • plos.figshare.com
    txt
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liam D. Bailey; Martijn van de Pol (2023). climwin: An R Toolbox for Climate Window Analysis [Dataset]. http://doi.org/10.1371/journal.pone.0167980
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Liam D. Bailey; Martijn van de Pol
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    When studying the impacts of climate change, there is a tendency to select climate data from a small set of arbitrary time periods or climate windows (e.g., spring temperature). However, these arbitrary windows may not encompass the strongest periods of climatic sensitivity and may lead to erroneous biological interpretations. Therefore, there is a need to consider a wider range of climate windows to better predict the impacts of future climate change. We introduce the R package climwin that provides a number of methods to test the effect of different climate windows on a chosen response variable and compare these windows to identify potential climate signals. climwin extracts the relevant data for each possible climate window and uses this data to fit a statistical model, the structure of which is chosen by the user. Models are then compared using an information criteria approach. This allows users to determine how well each window explains variation in the response variable and compare model support between windows. climwin also contains methods to detect type I and II errors, which are often a problem with this type of exploratory analysis. This article presents the statistical framework and technical details behind the climwin package and demonstrates the applicability of the method with a number of worked examples.

  18. BREAST-CANCER-EDA

    • kaggle.com
    zip
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). BREAST-CANCER-EDA [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/breast-cancer-eda
    Explore at:
    zip(50651 bytes)Available download formats
    Dataset updated
    Nov 26, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Comprehensive dataset for Exploratory Data Analysis (EDA) of breast cancer. Features include clinical measurements, demographic information, and diagnosis. A cleaned and structured resource suitable for machine learning preparation. Focuses on understanding feature distributions, correlations, and patient outcomes. Ideal for students and practitioners studying predictive modeling in healthcare.

  19. Healthcare Device Data Analysis with R

    • kaggle.com
    zip
    Updated Oct 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    stanley888cy (2021). Healthcare Device Data Analysis with R [Dataset]. https://www.kaggle.com/stanley888cy/google-project-02
    Explore at:
    zip(353177 bytes)Available download formats
    Dataset updated
    Oct 7, 2021
    Authors
    stanley888cy
    Description

    Context

    Hi. This is my data analysis project and also try using R in my work. They are the capstone project for Google Data Analysis Certificate Course offered in Coursera. (https://www.coursera.org/professional-certificates/google-data-analytics) It is about operation data analysis of data from health monitoring device. For detailed background story, please check the pdf file (Case 02.pdf) for reference.

    In this case study, I use personal health tracker data from Fitbit to evaluate the how the usage of health tracker device, and then determine if there are any trends or patterns.

    My data analysis will be focus in 2 area: exercise activity and sleeping habit. Exercise activity will be a study of relationship between activity type and calories consumed, while sleeping habit will be identify any the pattern of user sleeping. In this analysis, I will also try to use some linear regression model, so that the data can be explain in a quantitative way and make prediction easier.

    I understand that I am just new to data analysis and the skills or code is very beginner level. But I am working hard to learn more in both R and data science field. If you have any idea or feedback. Please feel free to comment.

    Stanley Cheng 2021-10-07

  20. Comparisons of predictive power.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Comparisons of predictive power. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present an R-package for predictive modelling, CARRoT (Cross-validation, Accuracy, Regression, Rule of Ten). CARRoT is a tool for initial exploratory analysis of the data, which performs exhaustive search for a regression model yielding the best predictive power with heuristic โ€˜rules of thumbโ€™ and expert knowledge as regularization parameters. It uses multiple hold-outs in order to internally validate the model. The package allows to take into account multiple factors such as collinearity of the predictors, event per variable rules (EPVs) and R-squared statistics during the model selection. In addition, other constraints, such as forcing specific terms and restricting complexity of the predictive models can be used. The package allows taking pairwise and three-way interactions between variables into account as well. These candidate models are then ranked by predictive power, which is assessed via multiple hold-out procedures and can be parallelised in order to reduce the computational time. Models which exhibited the highest average predictive power over all hold-outs are returned. This is quantified as absolute and relative error in case of continuous outcomes, accuracy and AUROC values in case of categorical outcomes. In this paper we briefly present statistical framework of the package and discuss the complexity of the underlying algorithm. Moreover, using CARRoT and a number of datasets available in R we provide comparison of different model selection techniques: based on EPVs alone, on EPVs and R-squared statistics, on lasso regression, on including only statistically significant predictors and on stepwise forward selection technique.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rajdeep Kaur Bajwa (2022). Data Analysis in R [Dataset]. https://www.kaggle.com/datasets/rajdeepkaurbajwa/data-analysis-r
Organization logo

Data Analysis in R

Bellabeat Case Study

Explore at:
zip(5321 bytes)Available download formats
Dataset updated
May 16, 2022
Authors
Rajdeep Kaur Bajwa
Description

Dataset

This dataset was created by Rajdeep Kaur Bajwa

Contents

Search
Clear search
Close search
Google apps
Main menu