100+ datasets found

Statistical Analysis of Individual Participant Data Meta-Analyses: A...
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0046042
Dataset updated
Jun 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
Global Statistical Analysis Software Market Size By Deployment Model, By...
verifiedmarketresearch.com
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Statistical Analysis Software Market Size By Deployment Model, By Application, By Component, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/statistical-analysis-software-market/
Explore at:
Dataset updated
Mar 7, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2030
Area covered
Global
Description
Statistical Analysis Software Market size was valued at USD 7,963.44 Million in 2023 and is projected to reach USD 13,023.63 Million by 2030, growing at a CAGR of 7.28% during the forecast period 2024-2030.

Global Statistical Analysis Software Market Drivers

The market drivers for the Statistical Analysis Software Market can be influenced by various factors. These may include:

Growing Data Complexity and Volume: The demand for sophisticated statistical analysis tools has been fueled by the exponential rise in data volume and complexity across a range of industries. Robust software solutions are necessary for organizations to evaluate and extract significant insights from huge datasets. Growing Adoption of Data-Driven Decision-Making: Businesses are adopting a data-driven approach to decision-making at a faster rate. Utilizing statistical analysis tools, companies can extract meaningful insights from data to improve operational effectiveness and strategic planning. Developments in Analytics and Machine Learning: As these fields continue to progress, statistical analysis software is now capable of more. These tools' increasing popularity can be attributed to features like sophisticated modeling and predictive analytics. A greater emphasis is being placed on business intelligence: Analytics and business intelligence are now essential components of corporate strategy. In order to provide business intelligence tools for studying trends, patterns, and performance measures, statistical analysis software is essential. Increasing Need in Life Sciences and Healthcare: Large volumes of data are produced by the life sciences and healthcare sectors, necessitating complex statistical analysis. The need for data-driven insights in clinical trials, medical research, and healthcare administration is driving the market for statistical analysis software. Growth of Retail and E-Commerce: The retail and e-commerce industries use statistical analytic tools for inventory optimization, demand forecasting, and customer behavior analysis. The need for analytics tools is fueled in part by the expansion of online retail and data-driven marketing techniques. Government Regulations and Initiatives: Statistical analysis is frequently required for regulatory reporting and compliance with government initiatives, particularly in the healthcare and finance sectors. In these regulated industries, statistical analysis software uptake is driven by this. Big Data Analytics's Emergence: As big data analytics has grown in popularity, there has been a demand for advanced tools that can handle and analyze enormous datasets effectively. Software for statistical analysis is essential for deriving valuable conclusions from large amounts of data. Demand for Real-Time Analytics: In order to make deft judgments fast, there is a growing need for real-time analytics. Many different businesses have a significant demand for statistical analysis software that provides real-time data processing and analysis capabilities. Growing Awareness and Education: As more people become aware of the advantages of using statistical analysis in decision-making, its use has expanded across a range of academic and research institutions. The market for statistical analysis software is influenced by the academic sector. Trends in Remote Work: As more people around the world work from home, they are depending more on digital tools and analytics to collaborate and make decisions. Software for statistical analysis makes it possible for distant teams to efficiently examine data and exchange findings.
U
Statistical Methods in Water Resources - Supporting Materials
data.usgs.gov
catalog.data.gov
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Hirsch; Karen Ryberg; Stacey Archfield; Edward Gilroy; Dennis Helsel (2020). Statistical Methods in Water Resources - Supporting Materials [Dataset]. http://doi.org/10.5066/P9JWL6XR
Explore at:
Unique identifier
https://doi.org/10.5066/P9JWL6XR
Dataset updated
Apr 7, 2020
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Robert Hirsch; Karen Ryberg; Stacey Archfield; Edward Gilroy; Dennis Helsel
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This dataset contains all of the supporting materials to accompany Helsel, D.R., Hirsch, R.M., Ryberg, K.R., Archfield, S.A., and Gilroy, E.J., 2020, Statistical methods in water resources: U.S. Geological Survey Techniques and Methods, book 4, chapter A3, 454 p., https://doi.org/10.3133/tm4a3. [Supersedes USGS Techniques of Water-Resources Investigations, book 4, chapter A3, version 1.1.]. Supplemental material (SM) for each chapter are available to re-create all examples and figures, and to solve the exercises at the end of each chapter, with relevant datasets provided in an electronic format readable by R. The SM provide (1) datasets as .Rdata files for immediate input into R, (2) datasets as .csv files for input into R or for use with other software programs, (3) R functions that are used in the textbook but not part of a published R package, (4) R scripts to produce virtually all of the figures in the book, and (5) solutions to the exercises as .html and .Rmd files. The suff ...
Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
Leading methods of data analytics application in M&A in the U.S. 2018
statista.com
Updated Jun 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2018). Leading methods of data analytics application in M&A in the U.S. 2018 [Dataset]. https://www.statista.com/statistics/943048/methods-of-data-analytics-application-in-manda-usa/
Explore at:
Dataset updated
Jun 15, 2018
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2018
Area covered
United States
Description
This statistic presents the leading methods of data analytics application in the mergers and acquisitions sector in the United States in 2018. At that time, ** percent of executives surveyed were using data analytics on customers and markets.
d
Data from: Best Management Practices Statistical Estimator (BMPSE) Version...
catalog.data.gov
data.usgs.gov
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Best Management Practices Statistical Estimator (BMPSE) Version 1.2.0 [Dataset]. https://catalog.data.gov/dataset/best-management-practices-statistical-estimator-bmpse-version-1-2-0
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The Best Management Practices Statistical Estimator (BMPSE) version 1.2.0 was developed by the U.S. Geological Survey (USGS), in cooperation with the Federal Highway Administration (FHWA) Office of Project Delivery and Environmental Review to provide planning-level information about the performance of structural best management practices for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway and urban runoff on the Nation's receiving waters (Granato 2013, 2014; Granato and others, 2021). The BMPSE was assembled by using a Microsoft Access® database application to facilitate calculation of BMP performance statistics. Granato (2014) developed quantitative methods to estimate values of the trapezoidal-distribution statistics, correlation coefficients, and the minimum irreducible concentration (MIC) from available data. Granato (2014) developed the BMPSE to hold and process data from the International Stormwater Best Management Practices Database (BMPDB, www.bmpdatabase.org). Version 1.0 of the BMPSE contained a subset of the data from the 2012 version of the BMPDB; the current version of the BMPSE (1.2.0) contains a subset of the data from the December 2019 version of the BMPDB. Selected data from the BMPDB were screened for import into the BMPSE in consultation with Jane Clary, the data manager for the BMPDB. Modifications included identifying water quality constituents, making measurement units consistent, identifying paired inflow and outflow values, and converting BMPDB water quality values set as half the detection limit back to the detection limit. Total polycyclic aromatic hydrocarbons (PAH) values were added to the BMPSE from BMPDB data; they were calculated from individual PAH measurements at sites with enough data to calculate totals. The BMPSE tool can sort and rank the data, calculate plotting positions, calculate initial estimates, and calculate potential correlations to facilitate the distribution-fitting process (Granato, 2014). For water-quality ratio analysis the BMPSE generates the input files and the list of filenames for each constituent within the Graphical User Interface (GUI). The BMPSE calculates the Spearman’s rho (ρ) and Kendall’s tau (τ) correlation coefficients with their respective 95-percent confidence limits and the probability that each correlation coefficient value is not significantly different from zero by using standard methods (Granato, 2014). If the 95-percent confidence limit values are of the same sign, then the correlation coefficient is statistically different from zero. For hydrograph extension, the BMPSE calculates ρ and τ between the inflow volume and the hydrograph-extension values (Granato, 2014). For volume reduction, the BMPSE calculates ρ and τ between the inflow volume and the ratio of outflow to inflow volumes (Granato, 2014). For water-quality treatment, the BMPSE calculates ρ and τ between the inflow concentrations and the ratio of outflow to inflow concentrations (Granato, 2014; 2020). The BMPSE also calculates ρ between the inflow and the outflow concentrations when a water-quality treatment analysis is done. The current version (1.2.0) of the BMPSE also has the option to calculate urban-runoff quality statistics from inflows to BMPs by using computer code developed for the Highway Runoff Database (Granato and Cazenas, 2009;Granato, 2019). Granato, G.E., 2013, Stochastic empirical loading and dilution model (SELDM) version 1.0.0: U.S. Geological Survey Techniques and Methods, book 4, chap. C3, 112 p., CD-ROM https://pubs.usgs.gov/tm/04/c03 Granato, G.E., 2014, Statistics for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater runoff best management practices (BMPs): U.S. Geological Survey Scientific Investigations Report 2014–5037, 37 p., http://dx.doi.org/10.3133/sir20145037. Granato, G.E., 2019, Highway-Runoff Database (HRDB) Version 1.1.0: U.S. Geological Survey data release, https://doi.org/10.5066/P94VL32J. Granato, G.E., and Cazenas, P.A., 2009, Highway-Runoff Database (HRDB Version 1.0)--A data warehouse and preprocessor for the stochastic empirical loading and dilution model: Washington, D.C., U.S. Department of Transportation, Federal Highway Administration, FHWA-HEP-09-004, 57 p. https://pubs.usgs.gov/sir/2009/5269/disc_content_100a_web/FHWA-HEP-09-004.pdf Granato, G.E., Spaetzel, A.B., and Medalie, L., 2021, Statistical methods for simulating structural stormwater runoff best management practices (BMPs) with the stochastic empirical loading and dilution model (SELDM): U.S. Geological Survey Scientific Investigations Report 2020–5136, 41 p., https://doi.org/10.3133/sir20205136
s
Data from: Data files used to study change dynamics in software systems
figshare.swinburne.edu.au
pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajesh Vasa (2024). Data files used to study change dynamics in software systems [Dataset]. http://doi.org/10.25916/sut.26288227.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25916/sut.26288227.v1
Dataset updated
Jul 22, 2024
Dataset provided by
Swinburne
Authors
Rajesh Vasa
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Comparison of features in SDA-V2 and well-known statistical analysis...
plos.figshare.com
xls
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jularat Chumnaul; Mohammad Sepehrifar (2024). Comparison of features in SDA-V2 and well-known statistical analysis software packages (Minitab and SPSS). [Dataset]. http://doi.org/10.1371/journal.pone.0297930.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0297930.t002
Dataset updated
Jul 3, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jularat Chumnaul; Mohammad Sepehrifar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of features in SDA-V2 and well-known statistical analysis software packages (Minitab and SPSS).
f
Summary of statistical methods and analysis.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated May 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bailey, Andrew P.; Lampe, Lena; Yoshimura, Azumi; Collinson, Lucy; Sorge, Sebastian; Burrell, Alana; Stefana, M. Irina; Lubojemska, Aleksandra; Gould, Alex P. (2021). Summary of statistical methods and analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000912603
Explore at:
Dataset updated
May 4, 2021
Authors
Bailey, Andrew P.; Lampe, Lena; Yoshimura, Azumi; Collinson, Lucy; Sorge, Sebastian; Burrell, Alana; Stefana, M. Irina; Lubojemska, Aleksandra; Gould, Alex P.
Description
For each main and supporting figures, the linear mixed models, statistical inference tests, and p-values are shown. (XLSX)
M
Multivariate Analysis Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Multivariate Analysis Software Report [Dataset]. https://www.datainsightsmarket.com/reports/multivariate-analysis-software-1402571
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Oct 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Multivariate Analysis Software market is poised for significant expansion, projected to reach an estimated market size of USD 4,250 million in 2025, with a robust Compound Annual Growth Rate (CAGR) of 12.5% anticipated through 2033. This growth is primarily fueled by the increasing adoption of advanced statistical techniques across a wide spectrum of industries, including the burgeoning pharmaceutical sector, sophisticated chemical research, and complex manufacturing processes. The demand for data-driven decision-making, coupled with the ever-growing volume of complex datasets, is compelling organizations to invest in powerful analytical tools. Key drivers include the rising need for predictive modeling in drug discovery and development, quality control in manufacturing, and risk assessment in financial applications. Emerging economies, particularly in the Asia Pacific region, are also contributing to this upward trajectory as they invest heavily in technological advancements and R&D, further amplifying the need for sophisticated analytical solutions. The market is segmented by application into Medical, Pharmacy, Chemical, Manufacturing, and Marketing. The Pharmacy and Medical applications are expected to witness the highest growth owing to the critical need for accurate data analysis in drug efficacy studies, clinical trials, and personalized medicine. In terms of types, the market encompasses a variety of analytical methods, including Multiple Linear Regression Analysis, Multiple Logistic Regression Analysis, Multivariate Analysis of Variance (MANOVA), Factor Analysis, and Cluster Analysis. While advanced techniques like MANOVA and Factor Analysis are gaining traction for their ability to uncover intricate relationships within data, the foundational Multiple Linear and Logistic Regression analyses remain widely adopted. Restraints, such as the high cost of specialized software and the need for skilled personnel to effectively utilize these tools, are being addressed by the emergence of more user-friendly interfaces and cloud-based solutions. Leading companies like Hitachi High-Tech America, OriginLab Corporation, and Minitab are at the forefront, offering comprehensive suites that cater to diverse analytical needs. This report provides an in-depth analysis of the global Multivariate Analysis Software market, encompassing a study period from 2019 to 2033, with a base and estimated year of 2025 and a forecast period from 2025 to 2033, building upon historical data from 2019-2024. The market is projected to witness significant expansion, driven by increasing data complexity and the growing need for advanced analytical capabilities across various industries. The estimated market size for Multivariate Analysis Software is expected to reach $2.5 billion by 2025, with projections indicating a substantial growth to $5.8 billion by 2033, demonstrating a robust compound annual growth rate (CAGR) of approximately 11.5% during the forecast period.
f
Appendix A. Detailed methods, statistical analysis, figures, and references....
datasetcatalog.nlm.nih.gov
wiley.figshare.com
+1more
Updated Aug 10, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teller, Brittany J.; Shea, Katriona; Campbell, Colin (2016). Appendix A. Detailed methods, statistical analysis, figures, and references. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001517073
Explore at:
Dataset updated
Aug 10, 2016
Authors
Teller, Brittany J.; Shea, Katriona; Campbell, Colin
Description
Detailed methods, statistical analysis, figures, and references.
d
Data from: U.S. Geological Survey Hydrologic Toolbox Software Archive
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). U.S. Geological Survey Hydrologic Toolbox Software Archive [Dataset]. https://catalog.data.gov/dataset/u-s-geological-survey-hydrologic-toolbox-software-archive
Explore at:
Dataset updated
Oct 8, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This software archive is superseded by Hydrologic Toolbox v1.1.0, available at the following citation: Barlow, P.M., McHugh, A.R., Kiang, J.E., Zhai, T., Hummel, P., Duda, P., and Hinz, S., 2024, U.S. Geological Survey Hydrologic Toolbox version 1.1.0 software archive: U.S. Geological Survey software release, https://doi.org/10.5066/P13VDNAK. The U.S. Geological Survey Hydrologic Toolbox is a Windows-based desktop software program that provides a graphical and mapping interface for analysis of hydrologic time-series data with a set of widely used and standardized computational methods. The software combines the analytical and statistical functionality provided in the U.S. Geological Survey (USGS) Groundwater (Barlow and others, 2014) and Surface-Water (Kiang and others, 2018) Toolboxes and provides several enhancements to these programs. The main analysis methods are the computation of hydrologic-frequency statistics such as the 7-day minimum flow that occurs on average only once every 10 years (7Q10); the computation of design flows, including biologically based flows; the computation of flow-duration curves and duration hydrographs; eight computer-programming methods for hydrograph separation of a streamflow time series, including the BFI (Base-flow index), HYSEP, PART, and SWAT Bflow methods and Eckhardt’s two-parameter digital-filtering method; and the RORA recession-curve displacement method and associated RECESS program to estimate groundwater-recharge values from streamflow data. Several of the statistical methods provided in the Hydrologic Toolbox are used primarily for computation of critical low-flow statistics. The Hydrologic Toolbox also facilitates retrieval of streamflow and groundwater-level time-series data from the USGS National Water Information System and outputs text reports that describe their analyses. The Hydrologic Toolbox supersedes and replaces the Groundwater and Surface-Water Toolboxes. The Hydrologic Toolbox was developed by use of the DotSpatial geographic information system (GIS) programming library, which is part of the MapWindow project (MapWindow, 2021). DotSpatial is a nonproprietary, open-source program written for the .NET framework that includes a spatial data viewer and GIS capabilities. This software archive is designed to document different versions of the Hydrologic Toolbox. Details about version changes are provided in the “Release.txt” file with this software release. Instructions for installing the software are provided in files “Installation_instructions.pdf” and “Installation_instructions.txt.” The “Installation_instructions.pdf” file includes screen captures of some of the installation steps, whereas the “Installation_instructions.txt” file does not. Each version of the Hydrologic Toolbox is provided in a separate .zip file. Citations: Barlow, P.M., Cunningham, W.L., Zhai, T., and Gray, M., 2014, U.S. Geological Survey groundwater toolbox, a graphical and mapping interface for analysis of hydrologic data (version 1.0)—User guide for estimation of base flow, runoff, and groundwater recharge from streamflow data: U.S. Geological Survey Techniques and Methods 3–B10, 27 p., https://doi.org/10.3133/tm3B10. Kiang, J.E., Flynn, K.M., Zhai, T., Hummel, P., and Granato, G., 2018, SWToolbox: A surface-water toolbox for statistical analysis of streamflow time series: U.S. Geological Survey Techniques and Methods, book 4, chap. A–11, 33 p., https://doi.org/10.3133/tm4A11. MapWindow, 2021, MapWindow software, accessed January 9, 2021, at https://www.mapwindow.org/#home.
Data analysis steps for each package in SDA-V2.
plos.figshare.com
zip
Updated Jul 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jularat Chumnaul; Mohammad Sepehrifar (2024). Data analysis steps for each package in SDA-V2. [Dataset]. http://doi.org/10.1371/journal.pone.0297930.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0297930.s001
Dataset updated
Jul 3, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jularat Chumnaul; Mohammad Sepehrifar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the software’s capabilities to achieve the most accurate and credible results.
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram...
plos.figshare.com
datasetcatalog.nlm.nih.gov
+1more
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole M. White; Thirunavukarasu Balasubramaniam; Richi Nayak; Adrian G. Barnett (2023). Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level). [Dataset]. http://doi.org/10.1371/journal.pone.0264360.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0264360.t001
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Nicole M. White; Thirunavukarasu Balasubramaniam; Richi Nayak; Adrian G. Barnett
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level).
Z
Replication package for "Evolution of statistical analysis in empirical...
data.niaid.nih.gov
Updated Jan 21, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
de Oliveira Neto, Francisco Gomes; Torkar, Richard; Feldt, Robert; Gren, Lucas; Furia, Carlo; Huang, Ziwei (2020). Replication package for "Evolution of statistical analysis in empirical software engineering research: Current state and steps forward" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3274927
Explore at:
Dataset updated
Jan 21, 2020
Dataset provided by
Chalmers and the University of Gothenburg
Universitá della Svizzera italiana
Authors
de Oliveira Neto, Francisco Gomes; Torkar, Richard; Feldt, Robert; Gren, Lucas; Furia, Carlo; Huang, Ziwei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package for the analysis done in the paper "Evolution of statistical analysis in empirical software engineering research: Current state and steps forward" (DOI: https://doi.org/10.1016/j.jss.2019.07.002, preprint: https://arxiv.org/abs/1706.00933).

The package includes CSV files with data on statistical usage extracted from 5 journals in SE (EMSE, IST, JSS, TOSEM, TSE). The data was extracted from papers between 2001 - 2015. The package also contains forms, scripts and figures (generated using the scripts) used in the paper.

The extraction tool mentioned in the paper is available in dockerhub via: https://hub.docker.com/r/robertfeldt/sept
Google Analytics data of an E-commerce Company
kaggle.com
zip
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fehu.zone (2024). Google Analytics data of an E-commerce Company [Dataset]. https://www.kaggle.com/datasets/fehu94/google-analytics-data-of-an-e-commerce-company
Explore at:
zip(3156 bytes)Available download formats
Dataset updated
Oct 19, 2024
Authors
fehu.zone
Description
📊 Dataset Title: Daily Active Users Dataset

📝 Description

This dataset provides detailed insights into daily active users (DAU) of a platform or service, captured over a defined period of time. The dataset includes information such as the number of active users per day, allowing data analysts and business intelligence teams to track usage trends, monitor platform engagement, and identify patterns in user activity over time.

The data is ideal for performing time series analysis, statistical analysis, and trend forecasting. You can utilize this dataset to measure the success of platform initiatives, evaluate user behavior, or predict future trends in engagement. It is also suitable for training machine learning models that focus on user activity prediction or anomaly detection.

📂 Dataset Structure

The dataset is structured in a simple and easy-to-use format, containing the following columns:

Date: The date on which the data was recorded, formatted as YYYYMMDD.

Number of Active Users: The number of users who were active on the platform on the corresponding date.

Each row in the dataset represents a unique date and its corresponding number of active users. This allows for time-based analysis, such as calculating the moving average of active users, detecting seasonality, or spotting sudden spikes or drops in engagement.

🧐 Key Use Cases

This dataset can be used for a wide range of purposes, including:

Time Series Analysis: Analyze trends and seasonality of user engagement.

Trend Detection: Discover peaks and valleys in user activity.

Anomaly Detection: Use statistical methods or machine learning algorithms to detect anomalies in user behavior.

Forecasting User Growth: Build forecasting models to predict future platform usage.

Seasonality Insights: Identify patterns like increased activity on weekends or holidays.

📈 Potential Analysis

Here are some specific analyses you can perform using this dataset:

Moving Average and Smoothing: Calculate the moving average over a 7-day or 30-day period.

Correlation with External Factors: Correlate daily active users with other datasets.

Statistical Hypothesis Testing: Perform t-tests or ANOVA to determine significant differences in user activity.

Machine Learning for Prediction: Train machine learning models to predict user engagement.

🚀 Getting Started

To get started with this dataset, you can load it into your preferred analysis tool. Here's how to do it using Python's pandas library:

import pandas as pd # Load the dataset data = pd.read_csv('path_to_dataset.csv') # Display the first few rows print(data.head()) # Basic statistics print(data.describe())
lock5stat
kaggle.com
zip
Updated Mar 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathurin Aché (2021). lock5stat [Dataset]. https://www.kaggle.com/mathurinache/lock5stat
Explore at:
zip(147620 bytes)Available download formats
Dataset updated
Mar 13, 2021
Authors
Mathurin Aché
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Sir R.A. Fisher said of simulation and permutation methods in 1936: "Actually, the statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method." These methods, too ‘tedious’ to apply in 1936, are now readily accessible. As George Cobb (2007) wrote in his lead article for the journal Technology Innovations in Statistical Education, “... despite broad acceptance and rapid growth in enrollments, the consensus curriculum is still an unwitting prisoner of history. What we teach is largely the technical machinery of numerical approximations based on the normal distribution and its many subsidiary cogs. This machinery was once necessary, because the conceptually simpler alternative based on permutations was computationally beyond our reach. Before computers statisticians had no choice. These days we have no excuse. Randomization-based inference makes a direct connection between data production and the logic of inference that deserves to be at the core of every introductory course.” It is our hope that the textbook we are writing will help move the introductory statistics curriculum in the directions advocated by Professor Cobb. We use ideas such as randomization tests and bootstrap intervals to introduce the fundamental ideas of statistical inference. These methods are surprisingly intuitive to novice students and, with proper use of computer support, are accessible at very early stages of a course. Our text introduces statistical inference through these resampling methods, not only because these methods are becoming increasingly important for statisticians in their own right but also because randomization methods are outstanding in building students’ conceptual understanding of the key ideas. Our text includes the more traditional methods such as t-tests, chi-square tests, etc., but only after students have developed a strong intuitive understanding of inference through randomization methods. At this point students have a conceptual understanding and appreciation for the results they can then compute using the more traditional methods. We believe that this approach helps students realize that although the formulae may take different forms for different types of data, the conceptual framework underlying most statistical methods remains the same. Furthermore, our experience has been that after using these new methods in intuitive ways to introduce the core ideas, students understand and can move quickly through most of the standard techniques. Our goal is a text that gently moves the curriculum in innovative ways while still looking relatively familiar. Instructors won’t need to completely abandon their current syllabi and students will be well-prepared for more traditional follow-up courses.
b
Guidelines for Computing Summary Statistics for Data-Sets Containing...
datahub.bvcentre.ca
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Guidelines for Computing Summary Statistics for Data-Sets Containing Non-Detects - Dataset - BVRC DataHub [Dataset]. https://datahub.bvcentre.ca/dataset/guidelines-for-computing-summary-statistics-for-data-sets-containing-non-detects
Explore at:
Dataset updated
Jun 3, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
INTRODUCTION As part of its responsibilities, the BC Ministry of Environment monitors water quality in the province’s streams, rivers, and lakes. Often, it is necessary to compile statistics involving concentrations of contaminants or other compounds. Quite often the instruments used cannot measure concentrations below certain values. These observations are called non-detects or less thans. However, non-detects pose a difficulty when it is necessary to compute statistical measurements such as the mean, the median, and the standard deviation for a data set. The way non-detects are handled can affect the quality of any statistics generated. Non-detects, or censored data are found in many fields such as medicine, engineering, biology, and environmetrics. In such fields, it is often the case that the measurements of interest are below some threshold. Dealing with non-detects is a significant issue and statistical tools using survival or reliability methods have been developed. Basically, there are three approaches for treating data containing censored values: 1. substitution, which gives poor results and therefore, is not recommended in the literature; 2. maximum likelihood estimation, which requires an assumption of some distributional form; and 3. and nonparametric methods which assess the shape of the data based on observed percentiles rather than a strict distributional form. This document provides guidance on how to record censor data, and on when and how to use certain analysis methods when the percentage of censored observations is less than 50%. The methods presented in this document are:1. substitution; 2. Kaplan-Meier, as part of nonparametric methods; 3. lognormal model based on maximum likelihood estimation; 4. and robust regression on order statistics, which is a semiparametric method. Statistical software suitable for survival or reliability analysis is available for dealing with censored data. This software has been widely used in medical and engineering environments. In this document, methods are illustrated with both R and JMP software packages, when possible. JMP often requires some intermediate steps to obtain summary statistics with most of the methods described in this document. R, with the NADA package is usually straightforward. The package NADA was developed specifically for computing statistics with non-detects in environmental data based on Helsel (2005b). The data used to illustrate the methods described for computing summary statistics for non-detects are either simulated or based on information acquired from the B.C. Ministry of Environment. This document is strongly based on the book Nondetects And Data Analysis written by Dennis R. Helsel in 2005 (Helsel, 2005b).
Sensitivity Analysis Tools for Clinical Trials with Missing Data [Methods...
icpsr.umich.edu
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scharfstein, Daniel O. (2025). Sensitivity Analysis Tools for Clinical Trials with Missing Data [Methods Study], 2013-2018 [Dataset]. http://doi.org/10.3886/ICPSR39492.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR39492.v1
Dataset updated
Sep 15, 2025
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Scharfstein, Daniel O.
License
https://www.icpsr.umich.edu/web/ICPSR/studies/39492/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39492/terms
Time period covered
2013 - 2018
Area covered
United States
Description
Clinical trials study the effects of medical treatments, like how safe they are and how well they work. But most clinical trials don't get all the data they need from patients. Patients may not answer all questions on a survey, or they may drop out of a study after it has started. The missing data can affect researchers' ability to detect the effects of treatments. To address the problem of missing data, researchers can make different guesses based on why and how data are missing. Then they can look at results for each guess. If results based on different guesses are similar, researchers can have more confidence that the study results are accurate. In this study, the research team created new methods to do these tests and developed software that runs these tests. To access the sensitivity analysis methods and software, please visit the MissingDataMatters website.
Methods used in cyber threat intelligence (CTI) analysis worldwide 2024, by...
statista.com
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Methods used in cyber threat intelligence (CTI) analysis worldwide 2024, by frequency [Dataset]. https://www.statista.com/statistics/1334482/cyber-threat-intelligence-methods-analysis-worldwide/
Explore at:
Dataset updated
Nov 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
In a 2024 survey, when asked what methods are most leveraged in cyber threat intelligence (CTI) analysis, over ** percent of respondents indicated frequently using knowledge bases such as Mitre ATT&CK, and around ** percent stated using this method occasionally. By contrast, using structured analytic techniques, such as key assumptions check, clustering, or Analysis of Competing Hypothesis (ACH) was the least used method for analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042

Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

Explore at:

108 scholarly articles cite this dataset (View in Google Scholar)

tiffAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0046042

Dataset updated

Jun 8, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.

Clear search

Close search

Google apps

Main menu

Statistical Analysis of Individual Participant Data Meta-Analyses: A...

Global Statistical Analysis Software Market Size By Deployment Model, By...

Statistical Methods in Water Resources - Supporting Materials

Collection of example datasets used for the book - R Programming -...

Leading methods of data analytics application in M&A in the U.S. 2018

Data from: Best Management Practices Statistical Estimator (BMPSE) Version...

Data from: Data files used to study change dynamics in software systems

Comparison of features in SDA-V2 and well-known statistical analysis...

Summary of statistical methods and analysis.

Multivariate Analysis Software Report

Appendix A. Detailed methods, statistical analysis, figures, and references....

Data from: U.S. Geological Survey Hydrologic Toolbox Software Archive

Data analysis steps for each package in SDA-V2.

Examples of boilerplate text from PLOS ONE papers based on targeted n-gram...

Replication package for "Evolution of statistical analysis in empirical...

Google Analytics data of an E-commerce Company

📊 Dataset Title: Daily Active Users Dataset

📝 Description

📂 Dataset Structure

🧐 Key Use Cases

📈 Potential Analysis

🚀 Getting Started

lock5stat

Guidelines for Computing Summary Statistics for Data-Sets Containing...

Sensitivity Analysis Tools for Clinical Trials with Missing Data [Methods...

Methods used in cyber threat intelligence (CTI) analysis worldwide 2024, by...

Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice