92 datasets found
  1. E

    Exploratory Data Analysis (EDA) Tools Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming Exploratory Data Analysis (EDA) tools market! Our in-depth analysis reveals key trends, growth drivers, and top players shaping this $3 billion industry, projected for 15% CAGR through 2033. Learn about market segmentation, regional insights, and future opportunities.

  2. ftmsRanalysis: An R package for exploratory data analysis and interactive...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue (2023). ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007654
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The high-resolution and mass accuracy of Fourier transform mass spectrometry (FT-MS) has made it an increasingly popular technique for discerning the composition of soil, plant and aquatic samples containing complex mixtures of proteins, carbohydrates, lipids, lignins, hydrocarbons, phytochemicals and other compounds. Thus, there is a growing demand for informatics tools to analyze FT-MS data that will aid investigators seeking to understand the availability of carbon compounds to biotic and abiotic oxidation and to compare fundamental chemical properties of complex samples across groups. We present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities. The package provides a suite of plotting methods and enables expedient, flexible and interactive visualization of complex datasets through functions which link to a powerful and interactive visualization user interface, Trelliscope. Example analysis using FT-MS data from a soil microbiology study demonstrates the core functionality of the package and highlights the capabilities for producing interactive visualizations.

  3. Data accompanying the seuFLViz R package for interactive exploratory data...

    • zenodo.org
    bin
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominic Shayler; Dominic Shayler; Kevin Stachelek; Kevin Stachelek; David Cobrinik; David Cobrinik (2025). Data accompanying the seuFLViz R package for interactive exploratory data analysis of single cell datasets as seurat objects [Dataset]. http://doi.org/10.5281/zenodo.15596099
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dominic Shayler; Dominic Shayler; Kevin Stachelek; Kevin Stachelek; David Cobrinik; David Cobrinik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data accompanying the seuFLViz R package for interactive exploratory data analysis of single cell datasets as seurat objects.

    Data collected by Dominic Shayler and described in:

    1. Shayler DW, Stachelek K, Cambier L, Lee S, Bai J, Reid MW, Weisenberger DJ, Bhat B, Aparicio JG, Kim Y, Singh M, Bay M, Thornton ME, Doyle EK, Fouladian Z, Erberich SG, Grubbs BH, Bonaguidi MA, Craft CM, Singh HP, Cobrinik D. Identification and characterization of early human photoreceptor states and cell-state-specific retinoblastoma-related features. eLife [Internet]. eLife Sciences Publications Limited; 2024 Nov 22 [cited 2024 Dec 20];13.
    Some raw data available in GEO: GSE207802
  4. Data Analysis in R

    • kaggle.com
    zip
    Updated May 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajdeep Kaur Bajwa (2022). Data Analysis in R [Dataset]. https://www.kaggle.com/datasets/rajdeepkaurbajwa/data-analysis-r
    Explore at:
    zip(5321 bytes)Available download formats
    Dataset updated
    May 16, 2022
    Authors
    Rajdeep Kaur Bajwa
    Description

    Dataset

    This dataset was created by Rajdeep Kaur Bajwa

    Contents

  5. f

    R-script to Analyse Data

    • uvaauas.figshare.com
    txt
    Updated Apr 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T. Blanke (2022). R-script to Analyse Data [Dataset]. http://doi.org/10.21942/uva.14346842.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 4, 2022
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    T. Blanke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exploratory data analysis and visualisation of datasets

  6. Stack Overflow tags

    • kaggle.com
    zip
    Updated Jan 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abid Ali Awan (2021). Stack Overflow tags [Dataset]. https://www.kaggle.com/datasets/kingabzpro/stack-overflow-tags/code
    Explore at:
    zip(273306 bytes)Available download formats
    Dataset updated
    Jan 6, 2021
    Authors
    Abid Ali Awan
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    How can we tell what programming languages and technologies are used by the most people? How about what languages are growing and which are shrinking, so that we can tell which are most worth investing time in?

    One excellent source of data is Stack Overflow, a programming question and answer site with more than 16 million questions on programming topics. By measuring the number of questions about each technology, we can get an approximate sense of how many people are using it. We're going to use open data from the Stack Exchange Data Explorer to examine the relative popularity of languages like R, Python, Java and Javascript have changed over time.

    Content

    Each Stack Overflow question has a tag, which marks a question to describe its topic or technology. For instance, there's a tag for languages like R or Python, and for packages like ggplot2 or pandas.

    We'll be working with a dataset with one observation for each tag in each year. The dataset includes both the number of questions asked in that tag in that year, and the total number of questions asked in that year.

    Acknowledgements

    DataCamp

  7. Data from: Superheat: An R Package for Creating Beautiful and Extendable...

    • tandf.figshare.com
    bin
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca L. Barter; Bin Yu (2024). Superheat: An R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data [Dataset]. http://doi.org/10.6084/m9.figshare.6287693.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Rebecca L. Barter; Bin Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This article introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this article is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources.

  8. BREAST-CANCER-EDA

    • kaggle.com
    zip
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). BREAST-CANCER-EDA [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/breast-cancer-eda
    Explore at:
    zip(50651 bytes)Available download formats
    Dataset updated
    Nov 26, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Comprehensive dataset for Exploratory Data Analysis (EDA) of breast cancer. Features include clinical measurements, demographic information, and diagnosis. A cleaned and structured resource suitable for machine learning preparation. Focuses on understanding feature distributions, correlations, and patient outcomes. Ideal for students and practitioners studying predictive modeling in healthcare.

  9. d

    Physical Properties of Lakes: Exploratory Data Analysis

    • search.dataone.org
    • hydroshare.org
    Updated Apr 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriela Garcia; Kateri Salk (2022). Physical Properties of Lakes: Exploratory Data Analysis [Dataset]. https://search.dataone.org/view/sha256%3A82a3bd46ad259724cad21b7a344728253ea4e6d929f6134e946c379585f903f6
    Explore at:
    Dataset updated
    Apr 15, 2022
    Dataset provided by
    Hydroshare
    Authors
    Gabriela Garcia; Kateri Salk
    Time period covered
    May 27, 1984 - Aug 17, 2016
    Area covered
    Description

    Exploratory Data Analysis for the Physical Properties of Lakes

    This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on the physical properties of lakes.

    Introduction

    Lakes are dynamic, nonuniform bodies of water in which the physical, biological, and chemical properties interact. Lakes also contain the majority of Earth's fresh water supply. This lesson introduces exploratory data analysis using R statistical software in the context of the physical properties of lakes.

    Learning Objectives

    After successfully completing this exercise, you will be able to:

    1. Apply exploratory data analytics skills to applied questions about physical properties of lakes
    2. Communicate findings with peers through oral, visual, and written modes
  10. The ten most co-endorsed locations of the CBM (of 5,402 possible) using data...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Cramer; Maisa Ziadni; Kristen Hymel Scherrer; Sean Mackey; Ming-Chih Kao (2023). The ten most co-endorsed locations of the CBM (of 5,402 possible) using data collected during the validation study. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010496.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Eric Cramer; Maisa Ziadni; Kristen Hymel Scherrer; Sean Mackey; Ming-Chih Kao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ten most co-endorsed locations of the CBM (of 5,402 possible) using data collected during the validation study.

  11. Wetlands Ecological Integrity Depth To Water Data - Great Sand Dunes...

    • catalog.data.gov
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2025). Wetlands Ecological Integrity Depth To Water Data - Great Sand Dunes National Park 2009-2019 [Dataset]. https://catalog.data.gov/dataset/wetlands-ecological-integrity-depth-to-water-data-great-sand-dunes-national-park-2009-2019
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    National Park Servicehttp://www.nps.gov/
    Description

    Wetlands Ecological Integrity Depth to Water Logger data from 2009-2019 at Great Sand Dunes National Park. This includes Raw dataset (primarily hourly), daily summaries, weekly summaries, and monthly summaries. Included in the data package are exploratory data analysis figures at the daily, weekly and monthly time steps. Lastly included is the R code used to extract the depth to water logger data from the National Park Service Aquarius data system, and to create the exploratory data analysis figures.

  12. Wetlands Ecological Integrity Depth To Water Data - Florissant Fossil Beds...

    • catalog.data.gov
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2025). Wetlands Ecological Integrity Depth To Water Data - Florissant Fossil Beds National Monument 2009-2019 [Dataset]. https://catalog.data.gov/dataset/wetlands-ecological-integrity-depth-to-water-data-florissant-fossil-beds-national-mon-2009
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    National Park Servicehttp://www.nps.gov/
    Area covered
    Florissant
    Description

    Wetlands Ecological Integrity Depth to Water Logger data from 2009-2019 at Florissant Fossil Beds National Monument. This includes Raw dataset (primarily hourly), daily summaries, weekly summaries, and monthly summaries. Included in the data package are exploratory data analysis figures at the daily, weekly and monthly time steps. Lastly included is the R code used to extract the depth to water logger data from the National Park Service Aquarius data system, and to create the exploratory data analysis figures.

  13. Lahman Baseball Database

    • kaggle.com
    zip
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dalya S (2025). Lahman Baseball Database [Dataset]. https://www.kaggle.com/datasets/dalyas/lahman-baseball-database
    Explore at:
    zip(9971692 bytes)Available download formats
    Dataset updated
    Jul 20, 2025
    Authors
    Dalya S
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    The Lahman Baseball Database is a comprehensive, open-source compilation of statistics and player data for Major League Baseball (MLB). It contains relational data from the 19th century through the most recent complete season, including batting, pitching, and fielding statistics, player demographics, awards, team performance, and managerial records.

    This dataset is widely used for exploratory data analysis, statistical modeling, predictive analysis, machine learning, and sports performance forecasting.

    This dataset is the latest CSV release of the Lahman Baseball Database, downloaded directly from https://sabr.org/lahman-database/. It includes historical MLB data spanning from 1871 to 2024, organized across 27 structured tables such as: - Batting: Player-level batting stats per year - Pitching: Season-level metrics - People: Biographical data (birth/death, handedness, debut/finalGame) - Teams, Managers: Team records - BattingPost, PitchingPost, FieldingPost: Post-season stats - AllstarFull: all star game - statsHallOfFame: Historical awards and recognitions

    Items to explore: - Track league-wide trends in home runs, strikeouts, or batting averages over time - Compare player performance by era, position, or righty/lefty - Create a timeline showing changes in a teams win-loss records - Map birthplace distributions of MLB players over time - Estimate the impact of rule changes on player stats (pitch clock, DH) - Model factors that influence MVP or Cy Young award wins - Predict a players future performance based on historical stats

    📘 License

    This dataset is released under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. Attribution is required. Derivative works must be shared under the same license.

    📝 Official source: https://sabr.org/lahman-database/ 📥 Direct data page: https://www.seanlahman.com/baseball-archive/statistics/ 🖊️ R-Package Documentation: https://cran.r-project.org/web/packages/Lahman/Lahman.pdf

    0.1 Copyright Notice & Limited Use License This database is copyright 1996-2025 by SABR, via generious donation from Sean Lahman. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/ For licensing information or further information, contact Scott Bush at: sbush@sabr.org 0.2 Contact Information Web site: https://sabr.org/lahman-database/ E-Mail: jpomrenke@sabr.org

  14. g

    Wetlands Ecological Integrity Depth To Water Data - Rocky Mountain National...

    • gimi9.com
    Updated Mar 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Wetlands Ecological Integrity Depth To Water Data - Rocky Mountain National Park 2007-2019 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_wetlands-ecological-integrity-depth-to-water-data-rocky-mountain-national-park-2007-2019
    Explore at:
    Dataset updated
    Mar 2, 2021
    Area covered
    Rocky Mountains
    Description

    Wetlands Ecological Integrity Depth to Water Logger data from 2007-2019 at Rocky Mountain National Park. This includes Raw dataset (primarily hourly), daily summaries, weekly summaries, and monthly summaries. Included in the data package are exploratory data analysis figures at the daily, weekly and monthly time steps. Lastly included is the R code used to extract the depth to water logger data from the National Park Service Aquarius data system, and to create the exploratory data analysis figures.

  15. Google Case Study: Bellabeat

    • kaggle.com
    zip
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kunal Chauhan (2022). Google Case Study: Bellabeat [Dataset]. https://www.kaggle.com/datasets/kunal0chauhan/fitabase-data
    Explore at:
    zip(25277889 bytes)Available download formats
    Dataset updated
    Mar 17, 2022
    Authors
    Kunal Chauhan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This Kaggle data set contains a personal fitness tracker from thirty Fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

    Content

    The data set was done by Bellabeat and collected data for 33 users of their physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. It includes narrow data and wide data, as well as daily, minute, and second data organized in the Month-day-year time format.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research. A big thanks to Möbius (https://www.kaggle.com/arashnic) for giving me access to this data source for my capstone project for my Google Data Analytics Certificate.

  16. r

    Exploratory data analysis of infrared spectra from 3D-printing polymers

    • researchdata.edu.au
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Lewis; Michael V. Adamos; Kari Pitts; Georgina Sauzier (2025). Exploratory data analysis of infrared spectra from 3D-printing polymers [Dataset]. http://doi.org/10.25917/FN6A-AZ80
    Explore at:
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    Curtin University
    Authors
    Simon Lewis; Michael V. Adamos; Kari Pitts; Georgina Sauzier
    Description

    Data description: This dataset consists of spectroscopic data files and associated R-scripts for exploratory data analysis. Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectra were collected from 67 samples of polymer filaments potentially used to produce illicit 3D-printed items. Principal component analysis (PCA) was used to determine if any individual filaments gave distinctive spectral signatures, potentially allowing traceability of 3D-printed items for forensic purposes. The project also investigated potential chemical variations induced by the filament manufacturing or 3D-printing process. Data was collected and analysed by Michael Adamos at Curtin University (Perth, Western Australia), under the supervision of Dr Georgina Sauzier and Prof. Simon Lewis and with specialist input from Dr Kari Pitts.

    Data collection time details: 2024
    Number of files/types: 3 .R files, 702 .JDX files
    Geographic information (if relevant): Australia
    Keywords: 3D printing, polymers, infrared spectroscopy, forensic science

  17. Cyclistic_bike _share_analysis_case_study

    • kaggle.com
    zip
    Updated Oct 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranjith@0073 (2025). Cyclistic_bike _share_analysis_case_study [Dataset]. https://www.kaggle.com/datasets/ranjith0073/cyclistic-bike-share-analysis-case-study
    Explore at:
    zip(585776 bytes)Available download formats
    Dataset updated
    Oct 16, 2025
    Authors
    Ranjith@0073
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    📊 Full Dataset
    The complete cleaned dataset used in this analysis is available for download (123 MB). A smaller sample is included in this repository for quick testing.

    📂 Project Overview
    This project analyzes Cyclistic bike-share data to uncover ride patterns, user behavior, and station popularity.
    It includes data cleaning, exploratory data analysis (EDA), and visualizations using R (tidyverse, ggplot2, lubridate).

    📈 Key Visualizations
    - Rides by User Type
    - Rides per Day of the Week
    - Ride Duration Distribution
    - Rides by Bike Type
    - Top 10 Start Stations
    (All visualizations are stored in the plots/ folder.)

    🧠 Key Insights
    - Subscribers ride more frequently than casual users.
    - Weekdays show higher ride volumes.
    - Most trips last under 30 minutes.
    - Top stations are concentrated in central business and tourist areas.

    🛠️ Tools Used
    - R
    - tidyverse
    - ggplot2
    - lubridate

    📈 Project by: Ranjithkumar R.K

  18. Data from: Penguins Go Parallel: A Grammar of Graphics Framework for...

    • tandf.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susan VanderPlas; Yawei Ge; Antony Unwin; Heike Hofmann (2023). Penguins Go Parallel: A Grammar of Graphics Framework for Generalized Parallel Coordinate Plots [Dataset]. http://doi.org/10.6084/m9.figshare.22467369.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Susan VanderPlas; Yawei Ge; Antony Unwin; Heike Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parallel Coordinate Plots (PCP) are a valuable tool for exploratory data analysis of high-dimensional numerical data. The use of PCPs is limited when working with categorical variables or a mix of categorical and continuous variables. In this article, we propose Generalized Parallel Coordinate Plots (GPCP) to extend the ability of PCPs from just numeric variables to dealing seamlessly with a mix of categorical and numeric variables in a single plot. In this process we find that existing solutions for categorical values only, such as hammock plots or parsets become edge cases in the new framework. By focusing on individual observations rather than a marginal frequency we gain additional flexibility. The resulting approach is implemented in the R package ggpcp. Supplementary materials for this article are available online.

  19. m

    Reddit r/AskScience Flair Dataset

    • data.mendeley.com
    Updated May 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit Mishra (2022). Reddit r/AskScience Flair Dataset [Dataset]. http://doi.org/10.17632/k9r2d9z999.3
    Explore at:
    Dataset updated
    May 23, 2022
    Authors
    Sumit Mishra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reddit is a social news, content rating and discussion website. It's one of the most popular sites on the internet. Reddit has 52 million daily active users and approximately 430 million users who use it once a month. Reddit has different subreddits and here We'll use the r/AskScience Subreddit.

    The dataset is extracted from the subreddit /r/AskScience from Reddit. The data was collected between 01-01-2016 and 20-05-2022. It contains 612,668 Datapoints and 25 Columns. The database contains a number of information about the questions asked on the subreddit, the description of the submission, the flair of the question, NSFW or SFW status, the year of the submission, and more. The data is extracted using python and Pushshift's API. A little bit of cleaning is done using NumPy and pandas as well. (see the descriptions of individual columns below).

    The dataset contains the following columns and descriptions: author - Redditor Name author_fullname - Redditor Full name contest_mode - Contest mode [implement obscured scores and randomized sorting]. created_utc - Time the submission was created, represented in Unix Time. domain - Domain of submission. edited - If the post is edited or not. full_link - Link of the post on the subreddit. id - ID of the submission. is_self - Whether or not the submission is a self post (text-only). link_flair_css_class - CSS Class used to identify the flair. link_flair_text - Flair on the post or The link flair’s text content. locked - Whether or not the submission has been locked. num_comments - The number of comments on the submission. over_18 - Whether or not the submission has been marked as NSFW. permalink - A permalink for the submission. retrieved_on - time ingested. score - The number of upvotes for the submission. description - Description of the Submission. spoiler - Whether or not the submission has been marked as a spoiler. stickied - Whether or not the submission is stickied. thumbnail - Thumbnail of Submission. question - Question Asked in the Submission. url - The URL the submission links to, or the permalink if a self post. year - Year of the Submission. banned - Banned by the moderator or not.

    This dataset can be used for Flair Prediction, NSFW Classification, and different Text Mining/NLP tasks. Exploratory Data Analysis can also be done to get the insights and see the trend and patterns over the years.

  20. n

    HadISD: Global sub-daily, surface meteorological station data, 1931-2023,...

    • data-search.nerc.ac.uk
    Updated Jul 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). HadISD: Global sub-daily, surface meteorological station data, 1931-2023, v3.4.0.2023f [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=dewpoint
    Explore at:
    Dataset updated
    Jul 24, 2021
    Description

    This is version v3.4.0.2023f of Met Office Hadley Centre's Integrated Surface Database, HadISD. These data are global sub-daily surface meteorological data. This update (v3.4.0.2023f) to HadISD corrects a long-standing bug which was discovered in autumn 2023 whereby the neighbour checks (and associated [un]flagging for some other tests) were not being implemented. For more details see the posts on the HadISD blog: https://hadisd.blogspot.com/2023/10/bug-in-buddy-checks.html & https://hadisd.blogspot.com/2024/01/hadisd-v3402023f-future-look.html The quality controlled variables in this dataset are: temperature, dewpoint temperature, sea-level pressure, wind speed and direction, cloud data (total, low, mid and high level). Past significant weather and precipitation data are also included, but have not been quality controlled, so their quality and completeness cannot be guaranteed. Quality control flags and data values which have been removed during the quality control process are provided in the qc_flags and flagged_values fields, and ancillary data files show the station listing with a station listing with IDs, names and location information. The data are provided as one NetCDF file per station. Files in the station_data folder station data files have the format "station_code"_HadISD_HadOBS_19310101-20240101_v3.4.1.2023f.nc. The station codes can be found under the docs tab. The station codes file has five columns as follows: 1) station code, 2) station name 3) station latitude 4) station longitude 5) station height. To keep informed about updates, news and announcements follow the HadOBS team on twitter @metofficeHadOBS. For more detailed information e.g bug fixes, routine updates and other exploratory analysis, see the HadISD blog: http://hadisd.blogspot.co.uk/ References: When using the dataset in a paper you must cite the following papers (see Docs for link to the publications) and this dataset (using the "citable as" reference) : Dunn, R. J. H., (2019), HadISD version 3: monthly updates, Hadley Centre Technical Note. Dunn, R. J. H., Willett, K. M., Parker, D. E., and Mitchell, L.: Expanding HadISD: quality-controlled, sub-daily station data from 1931, Geosci. Instrum. Method. Data Syst., 5, 473-491, doi:10.5194/gi-5-473-2016, 2016. Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Clim. Past, 8, 1649-1679, 2012, doi:10.5194/cp-8-1649-2012 Smith, A., N. Lott, and R. Vose, 2011: The Integrated Surface Database: Recent Developments and Partnerships. Bulletin of the American Meteorological Society, 92, 704–708, doi:10.1175/2011BAMS3015.1 For a homogeneity assessment of HadISD please see this following reference Dunn, R. J. H., K. M. Willett, C. P. Morice, and D. E. Parker. "Pairwise homogeneity assessment of HadISD." Climate of the Past 10, no. 4 (2014): 1501-1522. doi:10.5194/cp-10-1501-2014, 2014.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369

Exploratory Data Analysis (EDA) Tools Report

Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License

https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

Discover the booming Exploratory Data Analysis (EDA) tools market! Our in-depth analysis reveals key trends, growth drivers, and top players shaping this $3 billion industry, projected for 15% CAGR through 2033. Learn about market segmentation, regional insights, and future opportunities.

Search
Clear search
Close search
Google apps
Main menu