100+ datasets found
  1. Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  2. Leading data compilation and analytics presentation/reporting tools in U.S....

    • statista.com
    Updated Apr 30, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Leading data compilation and analytics presentation/reporting tools in U.S. 2015 [Dataset]. https://www.statista.com/statistics/562654/united-states-data-analytics-data-compilation-and-presentation-tools/
    Explore at:
    Dataset updated
    Apr 30, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    This statistic depicts the distribution of tools used to compile data and present analytics and/or reports to management, according to a marketing survey of C-level executives, conducted in ************* by Black Ink. As of *************, * percent of respondents used statistical modeling tools, such as IBM's SPSS or the SAS Institute's Statistical Analysis System package, to compile and present their reports.

  3. Market share of leading data analytics tools globally 2023

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Market share of leading data analytics tools globally 2023 [Dataset]. https://www.statista.com/statistics/982516/most-popular-data-analytics-software/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022 - Mar 2023
    Area covered
    Worldwide
    Description

    In 2023, Morningstar Advisor Workstation was by far the most popular data analytics software worldwide. According to a survey carried out between December 2022 and March 2023, the market share of Morningstar Advisor Workstation was ***** percent. It was followed by Riskalyze Elite, with ***** percent, and YCharts, with ***** percent.

  4. D

    Statistical Analysis Software Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Statistical Analysis Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/statistical-analysis-software-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 22, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Statistical Analysis Software Market Outlook



    The global market size for statistical analysis software was estimated at USD 11.3 billion in 2023 and is projected to reach USD 21.6 billion by 2032, growing at a compound annual growth rate (CAGR) of 7.5% during the forecast period. This substantial growth can be attributed to the increasing complexity of data in various industries and the rising need for advanced analytical tools to derive actionable insights.



    One of the primary growth factors for this market is the increasing demand for data-driven decision-making across various sectors. Organizations are increasingly recognizing the value of data analytics in enhancing operational efficiency, reducing costs, and identifying new business opportunities. The proliferation of big data and the advent of technologies such as artificial intelligence and machine learning are further fueling the demand for sophisticated statistical analysis software. Additionally, the growing adoption of cloud computing has significantly reduced the cost and complexity of deploying advanced analytics solutions, making them more accessible to organizations of all sizes.



    Another critical driver for the market is the increasing emphasis on regulatory compliance and risk management. Industries such as finance, healthcare, and manufacturing are subject to stringent regulatory requirements, necessitating the use of advanced analytics tools to ensure compliance and mitigate risks. For instance, in the healthcare sector, statistical analysis software is used for clinical trials, patient data management, and predictive analytics to enhance patient outcomes and ensure regulatory compliance. Similarly, in the financial sector, these tools are used for fraud detection, credit scoring, and risk assessment, thereby driving the demand for statistical analysis software.



    The rising trend of digital transformation across industries is also contributing to market growth. As organizations increasingly adopt digital technologies, the volume of data generated is growing exponentially. This data, when analyzed effectively, can provide valuable insights into customer behavior, market trends, and operational efficiencies. Consequently, there is a growing need for advanced statistical analysis software to analyze this data and derive actionable insights. Furthermore, the increasing integration of statistical analysis tools with other business intelligence and data visualization tools is enhancing their capabilities and driving their adoption across various sectors.



    From a regional perspective, North America currently holds the largest market share, driven by the presence of major technology companies and a high level of adoption of advanced analytics solutions. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, owing to the increasing adoption of digital technologies and the growing emphasis on data-driven decision-making in countries such as China and India. The region's rapidly expanding IT infrastructure and increasing investments in advanced analytics solutions are further contributing to this growth.



    Component Analysis



    The statistical analysis software market can be segmented by component into software and services. The software segment encompasses the core statistical analysis tools and platforms used by organizations to analyze data and derive insights. This segment is expected to hold the largest market share, driven by the increasing adoption of data analytics solutions across various industries. The availability of a wide range of software solutions, from basic statistical tools to advanced analytics platforms, is catering to the diverse needs of organizations, further driving the growth of this segment.



    The services segment includes consulting, implementation, training, and support services provided by vendors to help organizations effectively deploy and utilize statistical analysis software. This segment is expected to witness significant growth during the forecast period, driven by the increasing complexity of data analytics projects and the need for specialized expertise. As organizations seek to maximize the value of their data analytics investments, the demand for professional services to support the implementation and optimization of statistical analysis solutions is growing. Furthermore, the increasing trend of outsourcing data analytics functions to third-party service providers is contributing to the growth of the services segment.



    Within the software segment, the market can be further categori

  5. Data for Example I.

    • plos.figshare.com
    txt
    Updated Jul 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jularat Chumnaul; Mohammad Sepehrifar (2024). Data for Example I. [Dataset]. http://doi.org/10.1371/journal.pone.0297930.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jularat Chumnaul; Mohammad Sepehrifar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the software’s capabilities to achieve the most accurate and credible results.

  6. H

    Replication Data for: Navigating the Range of Statistical Tools for...

    • dataverse.harvard.edu
    Updated Apr 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skyler Cranmer; Philip Leifeld; Scott McClurg; Meredith Rolfe (2017). Replication Data for: Navigating the Range of Statistical Tools for Inferential Network Analysis [Dataset]. http://doi.org/10.7910/DVN/2XP8YF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 23, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Skyler Cranmer; Philip Leifeld; Scott McClurg; Meredith Rolfe
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2XP8YFhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2XP8YF

    Description

    The last decade has seen substantial advances in statistical techniques for the analysis of network data, and a major increase in the frequency with which these tools are used. These techniques are designed to accomplish the same broad goal, statistically valid inference in the presence of highly interdependent relationships, but important differences remain between them. We review three approaches commonly used for inferential network analysis---the Quadratic Assignment Procedure, Exponential Random Graph Model, and Latent Space Network Model---highlighting the strengths and weaknesses of the techniques relative to one another. An illustrative example using climate change policy network data shows that all three network models outperform standard logit estimates on multiple criteria. This paper introduces political scientists to a class of network techniques beyond simple descriptive measures of network structure, and helps researchers choose which model to use in their own research.

  7. Z

    Data Analysis for the Systematic Literature Review of DL4SE

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    College of William and Mary
    Washington and Lee University
    Authors
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

    The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

    Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

    Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

    Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

    Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

    Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

    We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

    Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

    Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise

  8. b

    Guidelines for Computing Summary Statistics for Data-Sets Containing...

    • datahub.bvcentre.ca
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Guidelines for Computing Summary Statistics for Data-Sets Containing Non-Detects - Dataset - BVRC DataHub [Dataset]. https://datahub.bvcentre.ca/dataset/guidelines-for-computing-summary-statistics-for-data-sets-containing-non-detects
    Explore at:
    Dataset updated
    Jun 3, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    INTRODUCTION As part of its responsibilities, the BC Ministry of Environment monitors water quality in the province’s streams, rivers, and lakes. Often, it is necessary to compile statistics involving concentrations of contaminants or other compounds. Quite often the instruments used cannot measure concentrations below certain values. These observations are called non-detects or less thans. However, non-detects pose a difficulty when it is necessary to compute statistical measurements such as the mean, the median, and the standard deviation for a data set. The way non-detects are handled can affect the quality of any statistics generated. Non-detects, or censored data are found in many fields such as medicine, engineering, biology, and environmetrics. In such fields, it is often the case that the measurements of interest are below some threshold. Dealing with non-detects is a significant issue and statistical tools using survival or reliability methods have been developed. Basically, there are three approaches for treating data containing censored values: 1. substitution, which gives poor results and therefore, is not recommended in the literature; 2. maximum likelihood estimation, which requires an assumption of some distributional form; and 3. and nonparametric methods which assess the shape of the data based on observed percentiles rather than a strict distributional form. This document provides guidance on how to record censor data, and on when and how to use certain analysis methods when the percentage of censored observations is less than 50%. The methods presented in this document are:1. substitution; 2. Kaplan-Meier, as part of nonparametric methods; 3. lognormal model based on maximum likelihood estimation; 4. and robust regression on order statistics, which is a semiparametric method. Statistical software suitable for survival or reliability analysis is available for dealing with censored data. This software has been widely used in medical and engineering environments. In this document, methods are illustrated with both R and JMP software packages, when possible. JMP often requires some intermediate steps to obtain summary statistics with most of the methods described in this document. R, with the NADA package is usually straightforward. The package NADA was developed specifically for computing statistics with non-detects in environmental data based on Helsel (2005b). The data used to illustrate the methods described for computing summary statistics for non-detects are either simulated or based on information acquired from the B.C. Ministry of Environment. This document is strongly based on the book Nondetects And Data Analysis written by Dennis R. Helsel in 2005 (Helsel, 2005b).

  9. Change over time analysis (CoTA) tool

    • gov.uk
    Updated Mar 31, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Energy & Climate Change (2011). Change over time analysis (CoTA) tool [Dataset]. https://www.gov.uk/government/statistics/change-over-time-analysis-cota-tool
    Explore at:
    Dataset updated
    Mar 31, 2011
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department of Energy & Climate Change
    Description

    Change over Time Analysis (CoTA) Viewer is a visual tool with accompanying Excel worksheets, which assists the analysis of change over time for small areas. In this version, electricity and gas data from 2005 to 2009 are used to analyse change at Middle – Layer Super Output Area in England and Wales.

    This tool supports the strategy for analysing change over time for small areas created by Neighborhood Statistics.

    The tool is available from the http://webarchive.nationalarchives.gov.uk/20130109092117/http:/www.decc.gov.uk/en/content/cms/statistics/energy_stats/regional/analytical/analytical.aspx">National Archives: Analytical tools web page.

    Access the http://www.neighbourhood.statistics.gov.uk/dissemination/Info.do;jessionid=Xb1mQqlJXRcJdnCtQZpzlQJXGpxd7XcsJ3PkXcvpG9dwpDTNVQGM!452292141!1357522281515?m=0&s=1357522281515&enc=1&page=analysisandguidance/analysistoolkit/analysis-toolkit.htm&nsjs=true&nsck=true&nssvg=false&nswid=1680">Neighbourhood Statistics Analysis Toolkit.

  10. Comparison of features in SDA-V2 and well-known statistical analysis...

    • plos.figshare.com
    xls
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jularat Chumnaul; Mohammad Sepehrifar (2024). Comparison of features in SDA-V2 and well-known statistical analysis software packages (Minitab and SPSS). [Dataset]. http://doi.org/10.1371/journal.pone.0297930.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jularat Chumnaul; Mohammad Sepehrifar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of features in SDA-V2 and well-known statistical analysis software packages (Minitab and SPSS).

  11. f

    Data from: ODM Data Analysis—A tool for the automatic validation, monitoring...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 22, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doods, Justin; Ständer, Sonja; Brix, Tobias Johannes; Bruland, Philipp; Ernsting, Jan; Dugas, Martin; Neuhaus, Philipp; Storck, Michael; Sarfraz, Saad (2018). ODM Data Analysis—A tool for the automatic validation, monitoring and generation of generic descriptive statistics of patient data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000711292
    Explore at:
    Dataset updated
    Jun 22, 2018
    Authors
    Doods, Justin; Ständer, Sonja; Brix, Tobias Johannes; Bruland, Philipp; Ernsting, Jan; Dugas, Martin; Neuhaus, Philipp; Storck, Michael; Sarfraz, Saad
    Description

    IntroductionA required step for presenting results of clinical studies is the declaration of participants demographic and baseline characteristics as claimed by the FDAAA 801. The common workflow to accomplish this task is to export the clinical data from the used electronic data capture system and import it into statistical software like SAS software or IBM SPSS. This software requires trained users, who have to implement the analysis individually for each item. These expenditures may become an obstacle for small studies. Objective of this work is to design, implement and evaluate an open source application, called ODM Data Analysis, for the semi-automatic analysis of clinical study data.MethodsThe system requires clinical data in the CDISC Operational Data Model format. After uploading the file, its syntax and data type conformity of the collected data is validated. The completeness of the study data is determined and basic statistics, including illustrative charts for each item, are generated. Datasets from four clinical studies have been used to evaluate the application’s performance and functionality.ResultsThe system is implemented as an open source web application (available at https://odmanalysis.uni-muenster.de) and also provided as Docker image which enables an easy distribution and installation on local systems. Study data is only stored in the application as long as the calculations are performed which is compliant with data protection endeavors. Analysis times are below half an hour, even for larger studies with over 6000 subjects.DiscussionMedical experts have ensured the usefulness of this application to grant an overview of their collected study data for monitoring purposes and to generate descriptive statistics without further user interaction. The semi-automatic analysis has its limitations and cannot replace the complex analysis of statisticians, but it can be used as a starting point for their examination and reporting.

  12. s

    Data from: Data files used to study change dynamics in software systems

    • figshare.swinburne.edu.au
    pdf
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajesh Vasa (2024). Data files used to study change dynamics in software systems [Dataset]. http://doi.org/10.25916/sut.26288227.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Swinburne
    Authors
    Rajesh Vasa
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).

  13. Ad-hoc statistical analysis: 2020/21 Quarter 1

    • gov.uk
    • s3.amazonaws.com
    Updated Jun 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Digital, Culture, Media & Sport (2020). Ad-hoc statistical analysis: 2020/21 Quarter 1 [Dataset]. https://www.gov.uk/government/statistical-data-sets/ad-hoc-statistical-analysis-202021-quarter-1
    Explore at:
    Dataset updated
    Jun 10, 2020
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Digital, Culture, Media & Sport
    Description

    This page lists ad-hoc statistics released during the period April - June 2020. These are additional analyses not included in any of the Department for Digital, Culture, Media and Sport’s standard publications.

    If you would like any further information please contact evidence@culture.gov.uk.

    April 2020 - DCMS Economic Estimates: Experimental quarterly GVA for time series analysis

    These are experimental estimates of the quarterly GVA in chained volume measures by DCMS sectors and subsectors between 2010 and 2018, which have been produced to help the department estimate the effect of shocks to the economy. Due to substantial revisions to the base data and methodology used to construct the tourism satellite account, estimates for the tourism sector are only available for 2017. For this reason “All DCMS Sectors” excludes tourism. Further, as chained volume measures are not available for Civil Society at present, this sector is also not included.

    The methods used to produce these estimates are experimental. The data here are not comparable to those published previously and users should refer to the annual reports for estimates of GVA by businesses in DCMS sectors.

    GVA generated by businesses in DCMS sectors (excluding Tourism and Civil Society) increased by 31.0% between the fourth quarters of 2010 and 2018. The UK economy grew by 16.7% over the same period.

    All individual DCMS sectors (excluding Tourism and Civil Society) grew faster than the UK average between quarter 4 of 2010 and 2018, apart from the Telecoms sector, which decreased by 10.1%.

    https://assets.publishing.service.gov.uk/media/6024fec3e90e07056334314c/2010_2019_GVA_Quarterly_V2.xlsx">Quarterly estimates of Gross Value Added (GVA, £ m) by activities in DCMS sectors and subsectors, 2010 - 2018

     <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">57.8 KB</span></p>
    

    April 2020 - Proportion of total DCMS sector turnover generated by businesses in different employment and turnover bands, 2017

    This data shows the proportion of the total turnover in DCMS sectors in 2017 that was generated by businesses according to individual businesses turnover, and by the number of employees.

    In 2017 a larger share of total turnover was generated by DCMS sector businesses with an annual turnover of less than one million pounds (11.4%) than the UK average (8.6%). In general, individual DCMS sectors tended to have a higher proportion of total turnover generated by businesses with individual turnover of less than one million pounds, with the exception of the Gambling (0.2%), Digital (8.2%) and Telecoms (2.0%, wholly within Digital) sectors.

    DCMS sectors tended to have a higher proportion of total turnover generated by large (250 employees or more) businesses (57.8%) than the UK average (51.4%). The exceptions were the Creative Industries (41.7%) and the Cultural sector (42.4%). Of all DCMS sectors, the Gambling sector had the highest proportion of total turnover generated by large businesses (97.5%).

    <a class="govuk-link" target="_self" tabindex="-1" aria-hidden="true" data-ga4-link='{"event_name":"file_download","type":"attachment"}' href="https://assets.publishin

  14. q

    Data from: A Customizable Inquiry-Based Statistics Teaching Application for...

    • qubeshub.org
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikus Abolins-Abols*; Natalie Christian; Jeffery Masters; Rachel Pigg (2024). A Customizable Inquiry-Based Statistics Teaching Application for Introductory Biology Students [Dataset]. https://qubeshub.org/publications/4651/?v=1
    Explore at:
    Dataset updated
    Apr 5, 2024
    Dataset provided by
    QUBES
    Authors
    Mikus Abolins-Abols*; Natalie Christian; Jeffery Masters; Rachel Pigg
    Description

    Building strong quantitative skills prepares undergraduate biology students for successful careers in science and medicine. While math and statistics anxiety can negatively impact student learning within biology classrooms, instructors may reduce this anxiety by steadily building student competency in quantitative reasoning through instructional scaffolding, application-based approaches, and simple computer program interfaces. However, few statistical programs exist that meet all needs of an inclusive, inquiry-based laboratory course. These needs include an open-source program, a simple interface, little required background knowledge in statistics for student users, and customizability to minimize cognitive load, align with course learning outcomes, and create desirable difficulty. To address these needs, we used the Shiny package in R to develop a custom statistical analysis application. Our “BioStats” app provides students with scaffolded learning experiences in applied statistics that promotes student agency and is customizable by the instructor. It introduces students to the strengths of the R interface, while eliminating the need for complex coding in the R programming language. It also prioritizes practical implementation of statistical analyses over learning statistical theory. To our knowledge, this is the first statistics teaching tool where students are presented basic statistics initially, more complex analyses as they advance, and includes an option to learn R statistical coding. The BioStats app interface yields a simplified introduction to applied statistics that is adaptable to many biology laboratory courses.

    Primary Image: Singing Junco. A sketch of a junco singing on a pine tree branch, created by the lead author of this paper.

  15. f

    Data from: Univariate and multivariate statistical tools for in vitro...

    • datasetcatalog.nlm.nih.gov
    • scielo.figshare.com
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Souza, Fernanda Vidigal Duarte; da Silva Ledo, Carlos Alberto; dos Santos Soares Filho, Walter; de Carvalho, Mariane de Jesus da Silva; Santos, Emanuela Barbosa; da Silva Souza, Antônio (2021). Univariate and multivariate statistical tools for in vitro conservation of citrus genotypes [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000803989
    Explore at:
    Dataset updated
    Mar 24, 2021
    Authors
    Souza, Fernanda Vidigal Duarte; da Silva Ledo, Carlos Alberto; dos Santos Soares Filho, Walter; de Carvalho, Mariane de Jesus da Silva; Santos, Emanuela Barbosa; da Silva Souza, Antônio
    Description

    ABSTRACT. This study aimed to evaluate the influence of the growing environment on the in vitro conservation of citrus genotypes obtained from the Active Citrus Germplasm Bank of Embrapa Cassava and Fruit. The study used multivariate statistic tools in order to improve the efficiency in the analysis of the results. Approximately 1-cm of length microcuttings from plantlets derived from ten genotypes previously cultured in vitro were inoculated in test tubes containing 20 mL of WPM culture medium supplemented with 25 g L-1 sucrose, solidified with 7 g L-1 agar and adjusted to a pH of 5.8, and maintained under three environmental conditions for 180 days. The experiment was carried out in a completely randomized design in a split-plot in the space, with 15 replications. The results indicate that the principal component analysis is an effective tool in studying the behavior of different genotypes conserved under different in vitro growing conditionsThe growing conditions of 22±1°C, a light intensity of 10 μmol m-2.s-1 and a 12 hours photoperiod was the most adequate for reducing the growth of in vitro conserved plants, increasing the subculture time interval while keeping the plants healthy.

  16. Largest investment data/analytics tools used by advisory firms worldwide...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Largest investment data/analytics tools used by advisory firms worldwide 2025 [Dataset]. https://www.statista.com/statistics/1263648/market-share-top-investment-data-analytics-tools/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2024 - Feb 2025
    Area covered
    Worldwide
    Description

    The leading investment data or analytics tool used by advisory firms worldwide in 2025 was by far Morningstar Advisor Workstation, with over ** percent of the market. YCharts followed, with market share of nearly ** percent.

  17. An Insight Into What Is Data Analytics?

    • kaggle.com
    zip
    Updated Sep 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    itcourses (2022). An Insight Into What Is Data Analytics? [Dataset]. https://www.kaggle.com/itcourses/an-insight-into-what-is-data-analytics
    Explore at:
    zip(60771 bytes)Available download formats
    Dataset updated
    Sep 19, 2022
    Authors
    itcourses
    Description

    What exactly is data analytics and do you want to learn so Visit BookMyShiksha they provide the Best Data Analytics Course in Delhi, INDIA. Analytics can be defined as "the science of analysis." A more practical definition, however, would be how an entity, such as a business, arrives at an optimal or realistic decision based on available data. Business managers may choose to make decisions based on past experiences or rules of thumb, or there may be other qualitative aspects to decision-making. Still, it will not be an analytical decision-making process unless data is considered.

    Analytics has been used in business since Frederick Winslow Taylor pioneered time management exercises in the late 1800s. Henry Ford revolutionized manufacturing by measuring the pacing of the assembly line. However, analytics gained popularity in the late 1960s, when computers were used in decision support systems. Analytics has evolved since then, with the development of enterprise resource planning (ERP) systems, data warehouses, and a wide range of other hardware and software tools and applications.

    Analytics is now used by businesses of all sizes. For example, if you ask my fruit vendor why he stopped servicing our street, he will tell you that we try to bargain a lot, which causes him to lose money, but on the road next to mine, he has some great customers for whom he provides excellent service. This is the nucleus of analytics. Our fruit vendor TESTED servicing my street and realised he was losing money - within a month, he stopped servicing us and will not show up even if we ask him. How many companies today are aware of who their MOST PROFITABLE CUSTOMERS are? Do they know who their most profitable customers are? And, knowing which customers are the most profitable, how should you direct your efforts to acquire the MOST PROFITABLE customers?

    Analytics is used to drive the overall organizational strategy in large corporations. Here are a few examples: • Capital One, a credit card company based in the United States, employs analytics to differentiate customers based on credit risk and to match customer characteristics with appropriate product offerings.

    • Harrah's Casino, another American company, discovered that, contrary to popular belief, their most profitable customers are those who play slots. They have developed a mamarketing program to attract and retain their MOST PROFITABLE CUSTOMERS in order to capitalise on this insight.

    • Netflicks, an online movie service, recommends the most logical movies based on past behavior. This model has increased their sales because the movie choices are based on the customers' preferences, and thus the experience is tailored to each individual.

    Analytics is commonly used to study business data using statistical analysis to discover and understand historical patterns in order to predict and improve future business performance. In addition, some people use the term to refer to the application of mathematics in business. Others believe that the field of analytics includes the use of operations research, statistics, and probability; however, limiting the field of Best Big Data Analytics Services to statistics and mathematics would be incorrect.

    While the concept is simple and intuitive, the widespread use of analytics to drive business is still in its infancy. Stay tuned for the second part of this article to learn more about the Science of Analytics.

  18. m

    Python Script for Simulating, Analyzing, and Evaluating Statistical...

    • data.mendeley.com
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kabir Bindawa Abdullahi (2025). Python Script for Simulating, Analyzing, and Evaluating Statistical Mirroring-Based Ordinalysis and Other Estimators [Dataset]. http://doi.org/10.17632/zdhy83cv4p.3
    Explore at:
    Dataset updated
    Jun 5, 2025
    Authors
    Kabir Bindawa Abdullahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This presentation involves simulation and data generation processes, data analysis, and evaluation of classical and proposed methods of ordinal data analysis. All the parameters and metrics used are based on the methodology presented in the article titled "Statistical Mirroring-Based Ordinalysis: A Sensitive, Robust, Efficient, and Ordinality-Preserving Descriptive Method for Analyzing Ordinal Assessment Data," authored by Kabir Bindawa Abdullahi in 2024. For further details, you can follow the paper's publication submitted to MethodsX Elsevier Publishing.

    The validation process of ordinal data analysis methods (estimators) has the following specifications: 
    

    • Simulation process: Monte Carlo simulation. • Data generation distributions: categorical, normal, and multivariate model distributions. • Data analysis: - Classical estimators: sum, average, and median ordinal score. - Proposed estimators: Kabirian coefficient of proximity, probability of proximity, probability of deviation.
    • Evaluation metrics: - Overall estimates average. - Overall estimates median. - Efficiency (by statistical absolute meanic deviation method). - Sensitivity (by entropy method). - Normality, Mann-Whitney-U test, and others.

  19. Data Insight: Google Analytics Capstone Project

    • kaggle.com
    zip
    Updated Mar 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sinderpreet (2024). Data Insight: Google Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/sinderpreet/datainsight-google-analytics-capstone-project
    Explore at:
    zip(215409585 bytes)Available download formats
    Dataset updated
    Mar 2, 2024
    Authors
    sinderpreet
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    Case study: How does a bike-share navigate speedy success?

    Scenario:

    As a data analyst on Cyclistic's marketing team, our focus is on enhancing annual memberships to drive the company's success. We aim to analyze the differing usage patterns between casual riders and annual members to craft a marketing strategy aimed at converting casual riders. Our recommendations, supported by data insights and professional visualizations, await Cyclistic executives' approval to proceed.

    About the company

    In 2016, Cyclistic launched a bike-share program in Chicago, growing to 5,824 bikes and 692 stations. Initially, their marketing aimed at broad segments with flexible pricing plans attracting both casual riders (single-ride or full-day passes) and annual members. However, recognizing that annual members are more profitable, Cyclistic is shifting focus to convert casual riders into annual members. To achieve this, they plan to analyze historical bike trip data to understand the differences and preferences between the two user groups, aiming to tailor marketing strategies that encourage casual riders to purchase annual memberships.

    Project Overview:

    This capstone project is a culmination of the skills and knowledge acquired through the Google Professional Data Analytics Certification. It focuses on Track 1, which is centered around Cyclistic, a fictional bike-share company modeled to reflect real-world data analytics scenarios in the transportation and service industry.

    Dataset Acknowledgment:

    We are grateful to Motivate Inc. for providing the dataset that serves as the foundation of this capstone project. Their contribution has enabled us to apply practical data analytics techniques to a real-world dataset, mirroring the challenges and opportunities present in the bike-sharing sector.

    Objective:

    The primary goal of this project is to analyze the Cyclistic dataset to uncover actionable insights that could help the company optimize its operations, improve customer satisfaction, and increase its market share. Through comprehensive data exploration, cleaning, analysis, and visualization, we aim to identify patterns and trends that inform strategic business decisions.

    Methodology:

    Data Collection: Utilizing the dataset provided by Motivate Inc., which includes detailed information on bike usage, customer behavior, and operational metrics. Data Cleaning and Preparation: Ensuring the dataset is accurate, complete, and ready for analysis by addressing any inconsistencies, missing values, or anomalies. Data Analysis: Applying statistical methods and data analytics techniques to extract meaningful insights from the dataset.

    Visualization and Reporting:

    Creating intuitive and compelling visualizations to present the findings clearly and effectively, facilitating data-driven decision-making. Findings and Recommendations:

    Conclusion:

    The Cyclistic Capstone Project not only demonstrates the practical application of data analytics skills in a real-world scenario but also provides valuable insights that can drive strategic improvements for Cyclistic. Through this project, showcasing the power of data analytics in transforming data into actionable knowledge, underscoring the importance of data-driven decision-making in today's competitive business landscape.

    Acknowledgments:

    Special thanks to Motivate Inc. for their support and for providing the dataset that made this project possible. Their contribution is immensely appreciated and has significantly enhanced the learning experience.

    STRATEGIES USED

    Case Study Roadmap - ASK

    ●What is the problem you are trying to solve? ●How can your insights drive business decisions?

    Key Tasks ● Identify the business task ● Consider key stakeholders

    Deliverable ● A clear statement of the business task

    Case Study Roadmap - PREPARE

    ● Where is your data located? ● Are there any problems with the data?

    Key tasks ● Download data and store it appropriately. ● Identify how it’s organized.

    Deliverable ● A description of all data sources used

    Case Study Roadmap - PROCESS

    ● What tools are you choosing and why? ● What steps have you taken to ensure that your data is clean?

    Key tasks ● Choose your tools. ● Document the cleaning process.

    Deliverable ● Documentation of any cleaning or manipulation of data

    Case Study Roadmap - ANALYZE

    ● Has your data been properly formaed? ● How will these insights help answer your business questions?

    Key tasks ● Perform calculations ● Formatting

    Deliverable ● A summary of analysis

    Case Study Roadmap - SHARE

    ● Were you able to answer all questions of stakeholders? ● Can Data visualization help you share findings?

    Key tasks ● Present your findings ● Create effective data viz.

    Deliverable ● Supporting viz and key findings

    **Case Study Roadmap - A...

  20. f

    Summary of statistical methods used in this paper.

    • datasetcatalog.nlm.nih.gov
    Updated Jul 17, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roberts, Seán G.; Winters, James; Chen, Keith (2015). Summary of statistical methods used in this paper. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001870687
    Explore at:
    Dataset updated
    Jul 17, 2015
    Authors
    Roberts, Seán G.; Winters, James; Chen, Keith
    Description

    A summary of the statistical methods used to assess whether the relationship between obligatory future tense (FTR) and the propensity to save money is robust to controlling for shared cultural history. Some methods aggregate the data over languages (column 3). Columns 4, 5 and 6 state whether the method implements a control for language family, geographic area and country, respectively. The mixed effects model is the only method that does not aggregate the data and which provides an explicit control for language family, geographic area and country. The final column suggests whether the overall result for the given method demonstrates that the relationship between FTR and savings behaviour is robust. However, this does indicate the status of tests for a given method (see text for details).Summary of statistical methods used in this paper.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Organization logo

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:
txtAvailable download formats
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Search
Clear search
Close search
Google apps
Main menu