42 datasets found
  1. E

    Exploratory Data Analysis (EDA) Tools Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming Exploratory Data Analysis (EDA) tools market! Our in-depth analysis reveals key trends, growth drivers, and top players shaping this $3 billion industry, projected for 15% CAGR through 2033. Learn about market segmentation, regional insights, and future opportunities.

  2. Top Software Companies: Market Cap,Sales & HQ Data

    • kaggle.com
    zip
    Updated Oct 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Asif (2024). Top Software Companies: Market Cap,Sales & HQ Data [Dataset]. https://www.kaggle.com/datasets/muhammadasif786/top-software-companies-market-capsales-and-hq-data
    Explore at:
    zip(1574 bytes)Available download formats
    Dataset updated
    Oct 27, 2024
    Authors
    Muhammad Asif
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    Dive into the dynamic world of the software industry with this comprehensive dataset featuring key metrics from top software companies for the years 2022 to 2023.

    This dataset provides valuable insights into:

    • 1. Organizations: A list of leading software companies shaping the tech landscape. Sales: Annual sales figures, showcasing the revenue generated by each company. -2.**Market Cap**: Important market capitalization data reflecting the companies' financial health and investor confidence. -3.**Headquarters**: Geographical information about where these companies are headquartered, highlighting regional influence. Harness this rich dataset to conduct exploratory data analysis (EDA), visualize trends, and uncover valuable business insights. Whether you're an analyst, researcher, or data enthusiast, this dataset is perfect for understanding the performance and positioning of key players in the software sector.

    Benefits:

    Comprehensive: Data covering essential metrics for informed analysis. Recent: Insights from the latest two years (2022-2023) for current market trends. User-Friendly: Organized structure for easy integration with data manipulation tools like Pandas. Take your data analysis to the next level and explore the competitive landscape of the software industry!

  3. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  4. Kaggle Top Datasets🚀📊

    • kaggle.com
    zip
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Frias (2024). Kaggle Top Datasets🚀📊 [Dataset]. https://www.kaggle.com/datasets/aaronfriasr/kaggle-top-datasets
    Explore at:
    zip(1572305 bytes)Available download formats
    Dataset updated
    Apr 10, 2024
    Authors
    Aaron Frias
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning

    Column description - Dataset_name - Name of the dataset - Author_name - Name of the author - Author_id - Kaggle id of the author - No_of_files - Number of files the author has uploaded - size - Size of all the files - Type_of_file - Type of the files such as csv, json etc. - Upvotes - Total upvotes of the dataset - Medals - Medal of the dataset - Usability - Usability of the dataset - Date - Date in which the dataset is uploaded - Day - Day in which the dataset is uploaded - Time - Time in which the dataset is uploaded - Dataset_link - Kaggle link of the dataset

    Acknowledgements The data has been scraped from the official Kaggle Website and is available under the Creative Common License.

    Enjoy & Keep Learning !!!

  5. Top 1000 Video Games

    • kaggle.com
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Faisal Ali (2023). Top 1000 Video Games [Dataset]. https://www.kaggle.com/datasets/faisaljanjua0555/top-1000-video-games/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Faisal Ali
    Description

    About This dataset contains the top 1000 games of all time based on their votes. It contains the ranking, name, year, genre and number of votes.

    Methodology This dataset was acquired using a web scraping tool called Beautiful soup and scraped IMDB website showing top video games of all time.

  6. H

    Replication Data for "Best Practices for Your Exploratory Factor Analysis: a...

    • dataverse.harvard.edu
    Updated Nov 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Rogers (2021). Replication Data for "Best Practices for Your Exploratory Factor Analysis: a Factor Tutorial" published by RAC-Revista de Administração Contemporânea [Dataset]. http://doi.org/10.7910/DVN/RCX8FF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 9, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Pablo Rogers
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This repository contains material related to the analysis performed in the article "Best Practices for Your Exploratory Factor Analysis: a Factor Tutorial". The material includes the data used in the analyses in .dat format, the labels (.txt) of the variables used in the Factor software, the outputs (.txt) evaluated in the article, and videos (.mp4 with English subtitles) recorded for the purpose of explaining the article. The videos can also be accessed in the following playlist: https://youtube.com/playlist?list=PLln41V0OsLHbSlYcDszn2PoTSiAwV5Oda. Below is a summary of the article: "Exploratory Factor Analysis (EFA) is one of the statistical methods most widely used in Administration, however, its current practice coexists with rules of thumb and heuristics given half a century ago. The purpose of this article is to present the best practices and recent recommendations for a typical EFA in Administration through a practical solution accessible to researchers. In this sense, in addition to discussing current practices versus recommended practices, a tutorial with real data on Factor is illustrated, a software that is still little known in the Administration area, but freeware, easy to use (point and click) and powerful. The step-by-step illustrated in the article, in addition to the discussions raised and an additional example, is also available in the format of tutorial videos. Through the proposed didactic methodology (article-tutorial + video-tutorial), we encourage researchers/methodologists who have mastered a particular technique to do the same. Specifically, about EFA, we hope that the presentation of the Factor software, as a first solution, can transcend the current outdated rules of thumb and heuristics, by making best practices accessible to Administration researchers". STEPS TO REPRODUCE This repository is composed of four types of files: 1) three video files in .mp4 format (with English subtitles), which discuss the article and the extra example mentioned in it; 2) two databases in .dat format: i) 1047 observations with 24 variables of the WHOQOL instrument discussed in the article; and ii) 918 observations with 10 variables of the FWB scale (extra example); 3) two labels files (.txt format) to be incorporated into the Factor software; and 4) five output files in .txt format. The steps are: 1st: Read the article “Best Practices for Your Exploratory Factor Analysis: a Factor Tutorial”. DOI: 10.1590/1982-7849rac2022210085.en; OR 1st: Watch the videos: i) 1_Video_BestPractices.mp4 (https://youtu.be/ITh1w4tFerA); and ii) 2_Video_MultidimensionalExample.mp4 (https://youtu.be/9X77ARoyys0); 2nd: Insert the database WHOQOL_Data.dat into the Factor software and, optionally, the label file WHOQOL_Labels.txt, as explained in section 4.2 of the article or in the section that begins at the timestamp 6:35 of the video 2_Video_MultidimensionalExample.mp4 (https://youtu.be/9X77ARoyys0?t=395); 3rd: Configure the analyses as explained in section 4.3 of the article or in the section that begins at the timestamp 10:45 of the video 2_Video_MultidimensionalExample.mp4 (https://youtu.be/9X77ARoyys0?t=645); 4th: Interpret the first output file (1_Output_WHOQOL_4Factors.txt) as explained in section 4.4 of the article or in the section that begins at the timestamp 20:45 of the video 2_Video_MultidimensionalExample.mp4 (https://youtu.be/9X77ARoyys0?t=1245); 5th: Interpret the second output file (2_Output_WHOQOL_2Factors.txt) as explained in the section that starts at the timestamp 49:53 of the video 2_Video_MultidimensionalExample.mp4 (https://youtu.be/9X77ARoyys0?t=2993); 6th: Interpret the third output file (3_Output_WHOQOL_2Factors_Ajusted.txt) as explained in the section that starts at the timestamp 1:05:45 of the video 2_Video_MultidimensionalExample.mp4 (https://youtu.be/9X77ARoyys0?t=3945); and 7th: Interpret the fourth output file (4_Output_WHOQOL_2Factors_Bifactor.txt) as explained in the section that starts at the timestamp 1:13:14 of the video 2_Video_MultidimensionalExample.mp4 (https://youtu.be/9X77ARoyys0?t=4394); OR, optionally, to replicate the extra example mentioned in the article: 8th: Insert the database FWB_Data.dat into the Factor software and, optionally, the label file FWB_Labels.txt, as explained in the section that starts at the timestamp 4:50 of the video 3_Video_UnidimensionalExample.mp4 (https://youtu.be/wFTGJG8XRRs?t=290); 9th: Configure the analyses as explained in the section that starts at the timestamp 8:32 of the video 3_Video_UnidimensionalExample.mp4 (https://youtu.be/wFTGJG8XRRs?t=512); and 10th: Interpret the output file FWB_Output.txt as explained in the section that begins at the timestamp 22:58 of the video 3_Video_UnidimensionalExample.mp4 (https://youtu.be/wFTGJG8XRRs?t=1378).

  7. Data from: A Stability Framework for Parameter Selection in the Minimum...

    • tandf.figshare.com
    bin
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiang Heng; Hui Shen; Kenneth Lange (2025). A Stability Framework for Parameter Selection in the Minimum Covariance Determinant Problem [Dataset]. http://doi.org/10.6084/m9.figshare.29039877.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Qiang Heng; Hui Shen; Kenneth Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Minimum Covariance Determinant (MCD) method is a widely adopted tool for robust estimation and outlier detection. In this article, we introduce MCD model selection based on the notion of stability. Our best subset method leverages prior best practices such as statistical depths for initialization and concentration steps for subset refinement. Our contribution lies in constructing a bootstrap procedure to estimate the instability of the best subset algorithm. The instability path offers insights into a dataset’s inlier/outlier structure and facilitates suitable choice of the subset size. We rigorously benchmark the proposed framework against existing MCD variants and illustrate its practical utility on several real-world datasets.

  8. Comparisons of predictive power.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Comparisons of predictive power. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present an R-package for predictive modelling, CARRoT (Cross-validation, Accuracy, Regression, Rule of Ten). CARRoT is a tool for initial exploratory analysis of the data, which performs exhaustive search for a regression model yielding the best predictive power with heuristic ‘rules of thumb’ and expert knowledge as regularization parameters. It uses multiple hold-outs in order to internally validate the model. The package allows to take into account multiple factors such as collinearity of the predictors, event per variable rules (EPVs) and R-squared statistics during the model selection. In addition, other constraints, such as forcing specific terms and restricting complexity of the predictive models can be used. The package allows taking pairwise and three-way interactions between variables into account as well. These candidate models are then ranked by predictive power, which is assessed via multiple hold-out procedures and can be parallelised in order to reduce the computational time. Models which exhibited the highest average predictive power over all hold-outs are returned. This is quantified as absolute and relative error in case of continuous outcomes, accuracy and AUROC values in case of categorical outcomes. In this paper we briefly present statistical framework of the package and discuss the complexity of the underlying algorithm. Moreover, using CARRoT and a number of datasets available in R we provide comparison of different model selection techniques: based on EPVs alone, on EPVs and R-squared statistics, on lasso regression, on including only statistically significant predictors and on stepwise forward selection technique.

  9. Top 500 Movies of all time

    • kaggle.com
    zip
    Updated Dec 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Faisal Ali (2022). Top 500 Movies of all time [Dataset]. https://www.kaggle.com/datasets/faisaljanjua0555/top-500-movies-of-all-time
    Explore at:
    zip(16190 bytes)Available download formats
    Dataset updated
    Dec 11, 2022
    Authors
    Muhammad Faisal Ali
    Description

    This dataset contains the top 500 movies of all time based on their rating. It contains the ranking, number of votes, year, length,genre and rating of the movies and more.

    Methodology This dataset was acquired using a web scraping tool called Beautiful soup and scraped IMDB website showing top movies of all time.

  10. f

    Data_Sheet_1_Bayesian Network Modeling Applied to Feline Calicivirus...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gilles Kratzer; Fraser I. Lewis; Barbara Willi; Marina L. Meli; Felicitas S. Boretti; Regina Hofmann-Lehmann; Paul Torgerson; Reinhard Furrer; Sonja Hartnack (2023). Data_Sheet_1_Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland.docx [Dataset]. http://doi.org/10.3389/fvets.2020.00073.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Gilles Kratzer; Fraser I. Lewis; Barbara Willi; Marina L. Meli; Felicitas S. Boretti; Regina Hofmann-Lehmann; Paul Torgerson; Reinhard Furrer; Sonja Hartnack
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Switzerland
    Description

    Bayesian network (BN) modeling is a rich and flexible analytical framework capable of elucidating complex veterinary epidemiological data. It is a graphical modeling technique that enables the visual presentation of multi-dimensional results while retaining statistical rigor in population-level inference. Using previously published case study data about feline calicivirus (FCV) and other respiratory pathogens in cats in Switzerland, a full BN modeling analysis is presented. The analysis shows that reducing the group size and vaccinating animals are the two actionable factors directly associated with FCV status and are primary targets to control FCV infection. The presence of gingivostomatitis and Mycoplasma felis is also associated with FCV status, but signs of upper respiratory tract disease (URTD) are not. FCV data is particularly well-suited to a network modeling approach, as both multiple pathogens and multiple clinical signs per pathogen are involved, along with multiple potentially interrelated risk factors. BN modeling is a holistic approach—all variables of interest may be mutually interdependent—which may help to address issues, such as confounding and collinear factors, as well as to disentangle directly vs. indirectly related variables. We introduce the BN methodology as an alternative to the classical uni- and multivariable regression approaches commonly used for risk factor analyses. We advise and guide researchers about how to use BNs as an exploratory data tool and demonstrate the limitations and practical issues. We present a step-by-step case study using FCV data along with all code necessary to reproduce our analyses in the open-source R environment. We compare and contrast the findings of the current case study using BN modeling with previous results that used classical regression techniques, and we highlight new potential insights. Finally, we discuss advanced methods, such as Bayesian model averaging, a common way of accounting for model uncertainty in a Bayesian network context.

  11. f

    Best performance based on linear and quadratic model.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Best performance based on linear and quadratic model. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Best performance based on linear and quadratic model.

  12. f

    Data from: Link function.

    • figshare.com
    xls
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Link function. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The table of link functions corresponding to different types of outcomes.

  13. Overfitting comparison.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Overfitting comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The table reports average absolute difference between overfitting defined in “Results and discussion” for different methods.

  14. Kaggle Dataset

    • kaggle.com
    zip
    Updated Feb 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chidambara Raju G (2023). Kaggle Dataset [Dataset]. https://www.kaggle.com/datasets/rajugc/kaggle-dataset/discussion
    Explore at:
    zip(1572305 bytes)Available download formats
    Dataset updated
    Feb 9, 2023
    Authors
    Chidambara Raju G
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning

    Column description

    • Dataset_name - Name of the dataset
    • Author_name - Name of the author
    • Author_id - Kaggle id of the author
    • No_of_files - Number of files the author has uploaded
    • size - Size of all the files
    • Type_of_file - Type of the files such as csv, json etc.
    • Upvotes - Total upvotes of the dataset
    • Medals - Medal of the dataset
    • Usability - Usability of the dataset
    • Date - Date in which the dataset is uploaded
    • Day - Day in which the dataset is uploaded
    • Time - Time in which the dataset is uploaded
    • Dataset_link - Kaggle link of the dataset

    Acknowledgements

    The data has been scraped from the official Kaggle Website and is available under the Creative Common License.

    Keep Learning !!!

  15. Comparison of predictive power.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Comparison of predictive power. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Significance-based models. The table reports percentage of time one model yields better/worse predictive power than the other one.

  16. f

    Combined overfitting and predictive power comparison.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Combined overfitting and predictive power comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The table summarises predictive power and overfitting.

  17. Detailed breakdown of overfitting comparison of CARRoT output and the other...

    • plos.figshare.com
    txt
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Detailed breakdown of overfitting comparison of CARRoT output and the other models. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overfitting in terms of absolute/relative error, accuracy/AUROC and accuracy only (for continuous, binary and multinomial outcomes respectively) computed both on training and test sets of different prediction methods on 43 datasets available in R using the default 90%/10% training/validation split. The methods used are CARRoT with EPV = 10, model, based on significant predictors only, lasso-based model, CARRoT with EPV = 10 and additional R2 constraint. (CSV)

  18. Overfitting comparison for different EPVs.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Overfitting comparison for different EPVs. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The table reports the average absolute difference between overfitting defined in Results and discussion for different EPV rules.

  19. f

    Predictive power for different EPVs and R2.

    • plos.figshare.com
    xls
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Bazarova; Marko Raseta (2023). Predictive power for different EPVs and R2. [Dataset]. http://doi.org/10.1371/journal.pone.0292597.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alina Bazarova; Marko Raseta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The average ratios between predictive power of models with different EPVs as in Table 6.

  20. f

    List of questions and corresponding answers for the 117th NMLE in Japan,...

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yudai Tanaka; Takuto Nakata; Ko Aiga; Takahide Etani; Ryota Muramatsu; Shun Katagiri; Hiroyuki Kawai; Fumiya Higashino; Masahiro Enomoto; Masao Noda; Mitsuhiro Kometani; Masayuki Takamura; Takashi Yoneda; Hiroaki Kakizaki; Akihiro Nomura (2024). List of questions and corresponding answers for the 117th NMLE in Japan, generated by GPT-4 using a tuned prompt. [Dataset]. http://doi.org/10.1371/journal.pdig.0000433.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 23, 2024
    Dataset provided by
    PLOS Digital Health
    Authors
    Yudai Tanaka; Takuto Nakata; Ko Aiga; Takahide Etani; Ryota Muramatsu; Shun Katagiri; Hiroyuki Kawai; Fumiya Higashino; Masahiro Enomoto; Masao Noda; Mitsuhiro Kometani; Masayuki Takamura; Takashi Yoneda; Hiroaki Kakizaki; Akihiro Nomura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Japan
    Description

    List of questions and corresponding answers for the 117th NMLE in Japan, generated by GPT-4 using a tuned prompt.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369

Exploratory Data Analysis (EDA) Tools Report

Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License

https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

Discover the booming Exploratory Data Analysis (EDA) tools market! Our in-depth analysis reveals key trends, growth drivers, and top players shaping this $3 billion industry, projected for 15% CAGR through 2033. Learn about market segmentation, regional insights, and future opportunities.

Search
Clear search
Close search
Google apps
Main menu