100+ datasets found
  1. Statistical Analysis of Individual Participant Data Meta-Analyses: A...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    tiff
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.

  2. d

    Data from: A simple method for statistical analysis of intensity differences...

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). A simple method for statistical analysis of intensity differences in microarray-derived gene expression data [Dataset]. https://catalog.data.gov/dataset/a-simple-method-for-statistical-analysis-of-intensity-differences-in-microarray-derived-ge
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.

  3. Examples of boilerplate text from PLOS ONE papers based on targeted n-gram...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole M. White; Thirunavukarasu Balasubramaniam; Richi Nayak; Adrian G. Barnett (2023). Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level). [Dataset]. http://doi.org/10.1371/journal.pone.0264360.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicole M. White; Thirunavukarasu Balasubramaniam; Richi Nayak; Adrian G. Barnett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level).

  4. U

    Statistical Methods in Water Resources - Supporting Materials

    • data.usgs.gov
    • catalog.data.gov
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Hirsch; Karen Ryberg; Stacey Archfield; Edward Gilroy; Dennis Helsel (2020). Statistical Methods in Water Resources - Supporting Materials [Dataset]. http://doi.org/10.5066/P9JWL6XR
    Explore at:
    Dataset updated
    Apr 7, 2020
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Robert Hirsch; Karen Ryberg; Stacey Archfield; Edward Gilroy; Dennis Helsel
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset contains all of the supporting materials to accompany Helsel, D.R., Hirsch, R.M., Ryberg, K.R., Archfield, S.A., and Gilroy, E.J., 2020, Statistical methods in water resources: U.S. Geological Survey Techniques and Methods, book 4, chapter A3, 454 p., https://doi.org/10.3133/tm4a3. [Supersedes USGS Techniques of Water-Resources Investigations, book 4, chapter A3, version 1.1.]. Supplemental material (SM) for each chapter are available to re-create all examples and figures, and to solve the exercises at the end of each chapter, with relevant datasets provided in an electronic format readable by R. The SM provide (1) datasets as .Rdata files for immediate input into R, (2) datasets as .csv files for input into R or for use with other software programs, (3) R functions that are used in the textbook but not part of a published R package, (4) R scripts to produce virtually all of the figures in the book, and (5) solutions to the exercises as .html and .Rmd files. The suff ...

  5. Statistical Analysis Methods

    • figshare.com
    txt
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucy Polhill (2021). Statistical Analysis Methods [Dataset]. http://doi.org/10.6084/m9.figshare.16438977.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 25, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Lucy Polhill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All statistics were done in R Studio

  6. Dataset for Linear Regression with 2 IV and 1 DV

    • kaggle.com
    zip
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stable Space (2025). Dataset for Linear Regression with 2 IV and 1 DV [Dataset]. https://www.kaggle.com/datasets/sharmajicoder/dataset-for-linear-regression-with-2-iv-and-1-dv
    Explore at:
    zip(9351 bytes)Available download formats
    Dataset updated
    Mar 25, 2025
    Authors
    Stable Space
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset for Linear Regression with two Independent variables and one Dependent variable. Focused on Testing, Visualization and Statistical Analysis. The dataset is synthetic and contains 100 instances.

  7. f

    Summary of statistical methods and analysis.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bailey, Andrew P.; Lampe, Lena; Yoshimura, Azumi; Collinson, Lucy; Sorge, Sebastian; Burrell, Alana; Stefana, M. Irina; Lubojemska, Aleksandra; Gould, Alex P. (2021). Summary of statistical methods and analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000912603
    Explore at:
    Dataset updated
    May 4, 2021
    Authors
    Bailey, Andrew P.; Lampe, Lena; Yoshimura, Azumi; Collinson, Lucy; Sorge, Sebastian; Burrell, Alana; Stefana, M. Irina; Lubojemska, Aleksandra; Gould, Alex P.
    Description

    For each main and supporting figures, the linear mixed models, statistical inference tests, and p-values are shown. (XLSX)

  8. s

    Data from: Data files used to study change dynamics in software systems

    • figshare.swinburne.edu.au
    pdf
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajesh Vasa (2024). Data files used to study change dynamics in software systems [Dataset]. http://doi.org/10.25916/sut.26288227.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Swinburne
    Authors
    Rajesh Vasa
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).

  9. YouTube Video and Channel Analysis

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). YouTube Video and Channel Analysis [Dataset]. https://www.kaggle.com/datasets/thedevastator/youtube-video-and-channel-analysis/discussion
    Explore at:
    zip(85613002 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    The Devastator
    Area covered
    YouTube
    Description

    YouTube Video and Channel Analysis

    YouTube Video and Channel Statistics

    By VISHWANATH SESHAGIRI [source]

    About this dataset

    This dataset contains valuable information about YouTube videos and channels, including various metrics related to views, likes, dislikes, comments, and other related statistics. The dataset consists of 9 direct features and 13 indirect features. The direct features include the ratio of comments on a video to the number of views on the video (comments/views), the total number of subscribers of the channel (subscriberCount), the ratio of likes on a video to the number of subscribers of the channel (likes/subscriber), the total number of views on the channel (channelViewCount), and several other informative ratios such as views/elapsedtime, totalviews/channelelapsedtime, comments/subscriber, views/subscribers, dislikes/subscriber.

    The dataset also includes indirect features that are derived from YouTube's API. These indirect features provide additional insights into videos and channels by considering factors such as dislikes/views ratio, channelCommentCount (total number of comments on the channel), likes/dislikes ratio, totviews/totsubs ratio (total views on a video to total subscribers of a channel), and more.

    The objective behind analyzing this dataset is to establish statistical relationships between videos and channels within YouTube. Furthermore, this analysis aims to form a topic tree based on these statistical relations.

    For further exploration or utilization purposes beyond this dataset description document itself, you can refer to relevant repositories such as the GitHub repository associated with this dataset where you might find useful resources that complement or expand upon what is available in this dataset.

    Overall,this comprehensive collection provides diverse insights into YouTube video and channel metadata for conducting statistical analyses in order to better understand viewer engagement patterns varies parameters across different channels. With its range from basic counts like subscriber counts,counting no.of viewership per minute , timing vs viewership rate ,text related user responses etc.,this detailed Youtube Dataset will assist in making informed decisions regarding channel optimization,more effective targeting and creation of content that will appeal to the target audience

    How to use the dataset

    This dataset provides valuable information about YouTube videos and their corresponding channels. With this data, you can perform statistical analysis to gain insights into various aspects of YouTube video and channel performance. Here is a guide on how to effectively use this dataset for your analysis:

    • Understanding the Columns:
      • totalviews/channelelapsedtime: The ratio of total views of a video to the elapsed time of the channel.
      • channelViewCount: The total number of views on the channel.
      • likes/subscriber: The ratio of likes on a video to the number of subscribers of the channel.
      • views/subscribers: The ratio of views on a video to the number of subscribers of the channel.
      • subscriberCount: The total number of subscribers of the channel.
      • dislikes/views: The ratio

    Research Ideas

    • Predicting the popularity of YouTube videos: By analyzing the various ratios and metrics in this dataset, such as comments/views, likes/subscriber, and views/subscribers, one can build predictive models to estimate the popularity or engagement level of YouTube videos. This can help content creators or businesses understand which types of videos are likely to be successful and tailor their content accordingly.
    • Analyzing channel performance: The dataset provides information about the total number of views on a channel (channelViewCount), the number of subscribers (subscriberCount), and other related statistics. By examining metrics like views/elapsedtime and totalviews/channelelapsedtime, one can assess how well a channel is performing over time. This analysis can help content creators identify trends or patterns in their viewership and make informed decisions about their video strategies.
    • Understanding audience engagement: Ratios like comments/subscriber, likes/dislikes, dislikes/subscriber provide insights into how engaged a channel's subscribers are with its content. By examining these ratios across multiple videos or channels, one can identify trends in audience behavior and preferences. For example, a high ratio of comments/subscriber may indicate strong community participation and active discussion around the videos posted by a particular YouTuber or channel

    Acknowledgements

    If you use this dataset in y...

  10. Z

    Replication package for "Evolution of statistical analysis in ESE research"

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Oliveira Neto, Francisco Gomes; Torkar, Richard; Feldt, Robert; Gren, Lucas; Furia, Carlo; Huang, Ziewi (2020). Replication package for "Evolution of statistical analysis in ESE research" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3294507
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Chalmers and the University of Gothenburg
    Authors
    de Oliveira Neto, Francisco Gomes; Torkar, Richard; Feldt, Robert; Gren, Lucas; Furia, Carlo; Huang, Ziewi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the replication package for the analysis done in the paper "Evolution of statistical analysis in empirical software engineering research: Current state and steps forward" (DOI: https://doi.org/10.1016/j.jss.2019.07.002, preprint: https://arxiv.org/abs/1706.00933).

    The package includes CSV files with data on statistical usage extracted from 5 journals in SE (EMSE, IST, JSS, TOSEM, TSE). The data was extracted from papers between 2001 - 2015. The package also contains forms, scripts and figures (generated using the scripts) used in the paper.

    The extraction tool mentioned in the paper is available in dockerhub via: https://hub.docker.com/r/robertfeldt/sept

  11. Google Analytics data of an E-commerce Company

    • kaggle.com
    zip
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fehu.zone (2024). Google Analytics data of an E-commerce Company [Dataset]. https://www.kaggle.com/datasets/fehu94/google-analytics-data-of-an-e-commerce-company
    Explore at:
    zip(3156 bytes)Available download formats
    Dataset updated
    Oct 19, 2024
    Authors
    fehu.zone
    Description

    📊 Dataset Title: Daily Active Users Dataset

    📝 Description

    This dataset provides detailed insights into daily active users (DAU) of a platform or service, captured over a defined period of time. The dataset includes information such as the number of active users per day, allowing data analysts and business intelligence teams to track usage trends, monitor platform engagement, and identify patterns in user activity over time.

    The data is ideal for performing time series analysis, statistical analysis, and trend forecasting. You can utilize this dataset to measure the success of platform initiatives, evaluate user behavior, or predict future trends in engagement. It is also suitable for training machine learning models that focus on user activity prediction or anomaly detection.

    📂 Dataset Structure

    The dataset is structured in a simple and easy-to-use format, containing the following columns:

    • Date: The date on which the data was recorded, formatted as YYYYMMDD.
    • Number of Active Users: The number of users who were active on the platform on the corresponding date.

    Each row in the dataset represents a unique date and its corresponding number of active users. This allows for time-based analysis, such as calculating the moving average of active users, detecting seasonality, or spotting sudden spikes or drops in engagement.

    🧐 Key Use Cases

    This dataset can be used for a wide range of purposes, including:

    1. Time Series Analysis: Analyze trends and seasonality of user engagement.
    2. Trend Detection: Discover peaks and valleys in user activity.
    3. Anomaly Detection: Use statistical methods or machine learning algorithms to detect anomalies in user behavior.
    4. Forecasting User Growth: Build forecasting models to predict future platform usage.
    5. Seasonality Insights: Identify patterns like increased activity on weekends or holidays.

    📈 Potential Analysis

    Here are some specific analyses you can perform using this dataset:

    • Moving Average and Smoothing: Calculate the moving average over a 7-day or 30-day period.
    • Correlation with External Factors: Correlate daily active users with other datasets.
    • Statistical Hypothesis Testing: Perform t-tests or ANOVA to determine significant differences in user activity.
    • Machine Learning for Prediction: Train machine learning models to predict user engagement.

    🚀 Getting Started

    To get started with this dataset, you can load it into your preferred analysis tool. Here's how to do it using Python's pandas library:

    import pandas as pd
    
    # Load the dataset
    data = pd.read_csv('path_to_dataset.csv')
    
    # Display the first few rows
    print(data.head())
    
    # Basic statistics
    print(data.describe())
    
  12. lock5stat

    • kaggle.com
    zip
    Updated Mar 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathurin Aché (2021). lock5stat [Dataset]. https://www.kaggle.com/mathurinache/lock5stat
    Explore at:
    zip(147620 bytes)Available download formats
    Dataset updated
    Mar 13, 2021
    Authors
    Mathurin Aché
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Sir R.A. Fisher said of simulation and permutation methods in 1936: "Actually, the statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method." These methods, too ‘tedious’ to apply in 1936, are now readily accessible. As George Cobb (2007) wrote in his lead article for the journal Technology Innovations in Statistical Education, “... despite broad acceptance and rapid growth in enrollments, the consensus curriculum is still an unwitting prisoner of history. What we teach is largely the technical machinery of numerical approximations based on the normal distribution and its many subsidiary cogs. This machinery was once necessary, because the conceptually simpler alternative based on permutations was computationally beyond our reach. Before computers statisticians had no choice. These days we have no excuse. Randomization-based inference makes a direct connection between data production and the logic of inference that deserves to be at the core of every introductory course.” It is our hope that the textbook we are writing will help move the introductory statistics curriculum in the directions advocated by Professor Cobb. We use ideas such as randomization tests and bootstrap intervals to introduce the fundamental ideas of statistical inference. These methods are surprisingly intuitive to novice students and, with proper use of computer support, are accessible at very early stages of a course. Our text introduces statistical inference through these resampling methods, not only because these methods are becoming increasingly important for statisticians in their own right but also because randomization methods are outstanding in building students’ conceptual understanding of the key ideas. Our text includes the more traditional methods such as t-tests, chi-square tests, etc., but only after students have developed a strong intuitive understanding of inference through randomization methods. At this point students have a conceptual understanding and appreciation for the results they can then compute using the more traditional methods. We believe that this approach helps students realize that although the formulae may take different forms for different types of data, the conceptual framework underlying most statistical methods remains the same. Furthermore, our experience has been that after using these new methods in intuitive ways to introduce the core ideas, students understand and can move quickly through most of the standard techniques. Our goal is a text that gently moves the curriculum in innovative ways while still looking relatively familiar. Instructors won’t need to completely abandon their current syllabi and students will be well-prepared for more traditional follow-up courses.

  13. D

    Statistical Analysis Software Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Statistical Analysis Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/statistical-analysis-software-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 22, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Statistical Analysis Software Market Outlook



    The global market size for statistical analysis software was estimated at USD 11.3 billion in 2023 and is projected to reach USD 21.6 billion by 2032, growing at a compound annual growth rate (CAGR) of 7.5% during the forecast period. This substantial growth can be attributed to the increasing complexity of data in various industries and the rising need for advanced analytical tools to derive actionable insights.



    One of the primary growth factors for this market is the increasing demand for data-driven decision-making across various sectors. Organizations are increasingly recognizing the value of data analytics in enhancing operational efficiency, reducing costs, and identifying new business opportunities. The proliferation of big data and the advent of technologies such as artificial intelligence and machine learning are further fueling the demand for sophisticated statistical analysis software. Additionally, the growing adoption of cloud computing has significantly reduced the cost and complexity of deploying advanced analytics solutions, making them more accessible to organizations of all sizes.



    Another critical driver for the market is the increasing emphasis on regulatory compliance and risk management. Industries such as finance, healthcare, and manufacturing are subject to stringent regulatory requirements, necessitating the use of advanced analytics tools to ensure compliance and mitigate risks. For instance, in the healthcare sector, statistical analysis software is used for clinical trials, patient data management, and predictive analytics to enhance patient outcomes and ensure regulatory compliance. Similarly, in the financial sector, these tools are used for fraud detection, credit scoring, and risk assessment, thereby driving the demand for statistical analysis software.



    The rising trend of digital transformation across industries is also contributing to market growth. As organizations increasingly adopt digital technologies, the volume of data generated is growing exponentially. This data, when analyzed effectively, can provide valuable insights into customer behavior, market trends, and operational efficiencies. Consequently, there is a growing need for advanced statistical analysis software to analyze this data and derive actionable insights. Furthermore, the increasing integration of statistical analysis tools with other business intelligence and data visualization tools is enhancing their capabilities and driving their adoption across various sectors.



    From a regional perspective, North America currently holds the largest market share, driven by the presence of major technology companies and a high level of adoption of advanced analytics solutions. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, owing to the increasing adoption of digital technologies and the growing emphasis on data-driven decision-making in countries such as China and India. The region's rapidly expanding IT infrastructure and increasing investments in advanced analytics solutions are further contributing to this growth.



    Component Analysis



    The statistical analysis software market can be segmented by component into software and services. The software segment encompasses the core statistical analysis tools and platforms used by organizations to analyze data and derive insights. This segment is expected to hold the largest market share, driven by the increasing adoption of data analytics solutions across various industries. The availability of a wide range of software solutions, from basic statistical tools to advanced analytics platforms, is catering to the diverse needs of organizations, further driving the growth of this segment.



    The services segment includes consulting, implementation, training, and support services provided by vendors to help organizations effectively deploy and utilize statistical analysis software. This segment is expected to witness significant growth during the forecast period, driven by the increasing complexity of data analytics projects and the need for specialized expertise. As organizations seek to maximize the value of their data analytics investments, the demand for professional services to support the implementation and optimization of statistical analysis solutions is growing. Furthermore, the increasing trend of outsourcing data analytics functions to third-party service providers is contributing to the growth of the services segment.



    Within the software segment, the market can be further categori

  14. m

    COVID-19 Combined Data-set with Improved Measurement Errors

    • data.mendeley.com
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
    Explore at:
    Dataset updated
    May 13, 2020
    Authors
    Afshin Ashofteh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.

  15. Youtube trending videos Data analysis dataset

    • kaggle.com
    zip
    Updated Mar 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitya Narla (2024). Youtube trending videos Data analysis dataset [Dataset]. https://www.kaggle.com/datasets/nityanarla/youtube-trending-videos-data-analysis-dataset
    Explore at:
    zip(17010709 bytes)Available download formats
    Dataset updated
    Mar 11, 2024
    Authors
    Nitya Narla
    Area covered
    YouTube
    Description

    Embark on a journey through the fascinating realm of YouTube trending videos with our latest project! Leveraging a comprehensive dataset, we delve into the intricate dynamics behind what makes a video trend on the world's largest video-sharing platform.

    Our dataset encapsulates an array of essential features including video_id, trending_date, title, location, channel_title, category_id, publish_time, tags, views, likes, dislikes, comment_count, thumbnail_link, comments_disabled, ratings_disabled, video_error, description, and sheild. With this treasure trove of information at our disposal, we uncover hidden patterns, explore correlations, and extract valuable insights to decode the secrets of YouTube's trending algorithm.

    Join us as we employ advanced data analysis techniques to unravel the mysteries behind viral content creation, audience engagement, and the ever-evolving landscape of online video trends. Whether you're a data enthusiast, content creator, or simply curious about the dynamics of digital media, this project offers a captivating exploration into the heart of YouTube's trending phenomenon.

    Unlock the power of data and embark on a journey of discovery with our YouTube Trending Video Data Analysis project today!

    YouTubeTrending

    DataAnalysis

    KaggleDataset

    VideoInsights

    DigitalMedia

    ContentCreation

    AudienceEngagement

    TrendAnalysis

    DataScience

    OnlineTrends

  16. Normal Q−Q plot ELISA

    • figshare.com
    pdf
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Miguel Carona Ferreira; Robert Huhle (2023). Normal Q−Q plot ELISA [Dataset]. http://doi.org/10.6084/m9.figshare.14671920.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Jorge Miguel Carona Ferreira; Robert Huhle
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Normal Q−Q plot from ELISA data

  17. M

    Multivariate Analysis Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Multivariate Analysis Software Report [Dataset]. https://www.datainsightsmarket.com/reports/multivariate-analysis-software-1402571
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Multivariate Analysis Software market is poised for significant expansion, projected to reach an estimated market size of USD 4,250 million in 2025, with a robust Compound Annual Growth Rate (CAGR) of 12.5% anticipated through 2033. This growth is primarily fueled by the increasing adoption of advanced statistical techniques across a wide spectrum of industries, including the burgeoning pharmaceutical sector, sophisticated chemical research, and complex manufacturing processes. The demand for data-driven decision-making, coupled with the ever-growing volume of complex datasets, is compelling organizations to invest in powerful analytical tools. Key drivers include the rising need for predictive modeling in drug discovery and development, quality control in manufacturing, and risk assessment in financial applications. Emerging economies, particularly in the Asia Pacific region, are also contributing to this upward trajectory as they invest heavily in technological advancements and R&D, further amplifying the need for sophisticated analytical solutions. The market is segmented by application into Medical, Pharmacy, Chemical, Manufacturing, and Marketing. The Pharmacy and Medical applications are expected to witness the highest growth owing to the critical need for accurate data analysis in drug efficacy studies, clinical trials, and personalized medicine. In terms of types, the market encompasses a variety of analytical methods, including Multiple Linear Regression Analysis, Multiple Logistic Regression Analysis, Multivariate Analysis of Variance (MANOVA), Factor Analysis, and Cluster Analysis. While advanced techniques like MANOVA and Factor Analysis are gaining traction for their ability to uncover intricate relationships within data, the foundational Multiple Linear and Logistic Regression analyses remain widely adopted. Restraints, such as the high cost of specialized software and the need for skilled personnel to effectively utilize these tools, are being addressed by the emergence of more user-friendly interfaces and cloud-based solutions. Leading companies like Hitachi High-Tech America, OriginLab Corporation, and Minitab are at the forefront, offering comprehensive suites that cater to diverse analytical needs. This report provides an in-depth analysis of the global Multivariate Analysis Software market, encompassing a study period from 2019 to 2033, with a base and estimated year of 2025 and a forecast period from 2025 to 2033, building upon historical data from 2019-2024. The market is projected to witness significant expansion, driven by increasing data complexity and the growing need for advanced analytical capabilities across various industries. The estimated market size for Multivariate Analysis Software is expected to reach $2.5 billion by 2025, with projections indicating a substantial growth to $5.8 billion by 2033, demonstrating a robust compound annual growth rate (CAGR) of approximately 11.5% during the forecast period.

  18. r

    The banksia plot: a method for visually comparing point estimates and...

    • researchdata.edu.au
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.V2
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:

    Background:

    In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.

    Methods:

    The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.

    Results:

    In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.

    Conclusions:

    The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.

    This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  19. d

    Zelig Models for Testing Advanced Statistical Analysis

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2023). Zelig Models for Testing Advanced Statistical Analysis [Dataset]. http://doi.org/10.7910/DVN/6OIEQE
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Description

    The aim of this study is to provide datasets for teaching and testing the methods embedded in the Advanced Statistical Analysis. For each datafile, there is an accompanying document describing (i) which models could be run and tested with this particular data and (ii) the steps for doing so.

  20. b

    Below the Method Detection Limit: Problems, Definitions, Data, and Solutions...

    • datahub.bvcentre.ca
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Below the Method Detection Limit: Problems, Definitions, Data, and Solutions - Dataset - BVRC DataHub [Dataset]. https://datahub.bvcentre.ca/dataset/below-the-method-detection-limit
    Explore at:
    Dataset updated
    Nov 3, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A workshop was held to address the analysis of data sets containing values below the method detection limit, common in activities like chemical analysis of air and water quality or assessing contaminants in plants and animals. Despite the value of this data, it's often ignored or mishandled. The workshop, led by statistician Carolyn Huston, focused on using the R software for statistical analysis in such cases. The workshop attracted participants from various organizations and received positive feedback. The goal was to equip attendees with tools to enhance data analysis and decision-making, recognizing that statistics is a way of tackling uncertainty.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
Organization logo

Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

Explore at:
108 scholarly articles cite this dataset (View in Google Scholar)
tiffAvailable download formats
Dataset updated
Jun 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.

Search
Clear search
Close search
Google apps
Main menu