100+ datasets found
  1. Statistical Analysis of Individual Participant Data Meta-Analyses: A...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    tiff
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.

  2. Data from: PISA Data Analysis Manual: SPSS, Second Edition

    • catalog.data.gov
    • s.cnmilf.com
    Updated Mar 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). PISA Data Analysis Manual: SPSS, Second Edition [Dataset]. https://catalog.data.gov/dataset/pisa-data-analysis-manual-spss-second-edition
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Description

    The OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performances in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database and worked examples providing full syntax in SPSS.

  3. d

    Tabular statistical summay of data analysis - Calawah River Riverscape Study...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2025). Tabular statistical summay of data analysis - Calawah River Riverscape Study [Dataset]. https://catalog.data.gov/dataset/tabular-statistical-summay-of-data-analysis-calawah-river-riverscape-study3
    Explore at:
    Dataset updated
    May 24, 2025
    Dataset provided by
    (Point of Contact, Custodian)
    Area covered
    Calawah River
    Description

    The objective of this study was to identify the patterns of juvenile salmonid distribution and relative abundance in relation to habitat correlates. It is the first dataset of its kind because the entire river was snorkeled by one person in multiple years. During two consecutive summers, we completed a census of juvenile salmonids and stream habitat across a stream network. We used the data to test the ability of habitat models to explain the distribution of juvenile coho salmon (Oncorhynchus kisutch), young-of-the-year (age 0) steelhead (Oncorhynchus mykiss), and steelhead parr (= age 1) for a network consisting of several different sized streams. Our network-scale models, which included five stream habitat variables, explained 27%, 11%, and 19% of the variation in the density of juvenile coho salmon, age 0 steelhead, and steelhead parr, respectively. We found weak to strong levels of spatial auto-correlation in the model residuals (Moran's I values ranging from 0.25 - 0.71). Explanatory power of base habitat models increased substantially and the level of spatial auto-correlation decreased with sequential inclusion of variables accounting for stream size, year, stream, and reach location. The models for specific streams underscored the variability that was implied in the network-scale models. Associations between juvenile salmonids and individual habitat variables were rarely linear and ranged from negative to positive, and the variable accounting for location of the habitat within a stream was often more important than any individual habitat variable. The limited success in predicting the summer distribution and density of juvenile coho salmon and steelhead with our network-scale models was apparently related to variation in the strength and shape of fish-habitat associations across and within streams and years. Summary of statistical analysis of the Calawah Riverscape data. NOAA was not involved and did not pay for the collection of this data. This data represents the statistical analysis carried out by Martin Liermann as a NOAA employee.

  4. f

    Initial data analysis checklist for data screening in longitudinal studies.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner (2024). Initial data analysis checklist for data screening in longitudinal studies. [Dataset]. http://doi.org/10.1371/journal.pone.0295726.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Initial data analysis checklist for data screening in longitudinal studies.

  5. e

    Journal of Data Analysis and Information Processing - impact-factor

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Journal of Data Analysis and Information Processing - impact-factor [Dataset]. https://exaly.com/journal/61638/journal-of-data-analysis-and-information-processing
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The graph shows the changes in the impact factor of ^ and its corresponding percentile for the sake of comparison with the entire literature. Impact Factor is the most common scientometric index, which is defined by the number of citations of papers in two preceding years divided by the number of papers published in those years.

  6. Z

    Conceptualization of public data ecosystems

    • data.niaid.nih.gov
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
    Explore at:
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    University of Hradec Králové
    University of Tartu
    Authors
    Anastasija, Nikiforova; Martin, Lnenicka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

    As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

    This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

    Description of the data in this data set

    PublicDataEcosystem_SLR provides the structure of the protocol

    Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

    Spreadsheets #2 provides the protocol structure.

    Spreadsheets #3 provides the filled protocol for relevant studies.

    The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

    Descriptive Information

    Article number

    A study number, corresponding to the study number assigned in an Excel worksheet

    Complete reference

    The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

    Year of publication

    The year in which the study was published.

    Journal article / conference paper / book chapter

    The type of the paper, i.e., journal article, conference paper, or book chapter.

    Journal / conference / book

    Journal article, conference, where the paper is published.

    DOI / Website

    A link to the website where the study can be found.

    Number of words

    A number of words of the study.

    Number of citations in Scopus and WoS

    The number of citations of the paper in Scopus and WoS digital libraries.

    Availability in Open Access

    Availability of a study in the Open Access or Free / Full Access.

    Keywords

    Keywords of the paper as indicated by the authors (in the paper).

    Relevance for our study (high / medium / low)

    What is the relevance level of the paper for our study

    Approach- and research design-related information

    Approach- and research design-related information

    Objective / Aim / Goal / Purpose & Research Questions

    The research objective and established RQs.

    Research method (including unit of analysis)

    The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

    Study’s contributions

    The study’s contribution as defined by the authors

    Qualitative / quantitative / mixed method

    Whether the study uses a qualitative, quantitative, or mixed methods approach?

    Availability of the underlying research data

    Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

    Period under investigation

    Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

    Use of theory / theoretical concepts / approaches? If yes, specify them

    Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

    Quality-related information

    Quality concerns

    Whether there are any quality concerns (e.g., limited information about the research methods used)?

    Public Data Ecosystem-related information

    Public data ecosystem definition

    How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

    Public data ecosystem evolution / development

    Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

    What constitutes a public data ecosystem?

    What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

    Components and relationships

    What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

    Stakeholders

    What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

    Actors and their roles

    What actors does the public data ecosystem involve? What are their roles?

    Data (data types, data dynamism, data categories etc.)

    What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

    Processes / activities / dimensions, data lifecycle phases

    What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

    Level (if relevant)

    What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

    Other elements or relationships (if any)

    What other elements or relationships does the public data ecosystem consist of?

    Additional comments

    Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

    New papers

    Does the study refer to any other potentially relevant papers?

    Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

    Format of the file.xls, .csv (for the first spreadsheet only), .docx

    Licenses or restrictionsCC-BY

    For more info, see README.txt

  7. Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx

    • frontiersin.figshare.com
    docx
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ethan M. Meyers (2024). Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx [Dataset]. http://doi.org/10.3389/fninf.2023.1275903.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Ethan M. Meyers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Neural decoding is a powerful method to analyze neural activity. However, the code needed to run a decoding analysis can be complex, which can present a barrier to using the method. In this paper we introduce a package that makes it easy to perform decoding analyses in the R programing language. We describe how the package is designed in a modular fashion which allows researchers to easily implement a range of different analyses. We also discuss how to format data to be able to use the package, and we give two examples of how to use the package to analyze real data. We believe that this package, combined with the rich data analysis ecosystem in R, will make it significantly easier for researchers to create reproducible decoding analyses, which should help increase the pace of neuroscience discoveries.

  8. f

    Data from: Analysis and Visualization of Quantitative Proteomics Data Using...

    • acs.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yi Hsiao; Haijian Zhang; Ginny Xiaohe Li; Yamei Deng; Fengchao Yu; Hossein Valipour Kahrood; Joel R. Steele; Ralf B. Schittenhelm; Alexey I. Nesvizhskii (2024). Analysis and Visualization of Quantitative Proteomics Data Using FragPipe-Analyst [Dataset]. http://doi.org/10.1021/acs.jproteome.4c00294.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 10, 2024
    Dataset provided by
    ACS Publications
    Authors
    Yi Hsiao; Haijian Zhang; Ginny Xiaohe Li; Yamei Deng; Fengchao Yu; Hossein Valipour Kahrood; Joel R. Steele; Ralf B. Schittenhelm; Alexey I. Nesvizhskii
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The FragPipe computational proteomics platform is gaining widespread popularity among the proteomics research community because of its fast processing speed and user-friendly graphical interface. Although FragPipe produces well-formatted output tables that are ready for analysis, there is still a need for an easy-to-use and user-friendly downstream statistical analysis and visualization tool. FragPipe-Analyst addresses this need by providing an R shiny web server to assist FragPipe users in conducting downstream analyses of the resulting quantitative proteomics data. It supports major quantification workflows, including label-free quantification, tandem mass tags, and data-independent acquisition. FragPipe-Analyst offers a range of useful functionalities, such as various missing value imputation options, data quality control, unsupervised clustering, differential expression (DE) analysis using Limma, and gene ontology and pathway enrichment analysis using Enrichr. To support advanced analysis and customized visualizations, we also developed FragPipeAnalystR, an R package encompassing all FragPipe-Analyst functionalities that is extended to support site-specific analysis of post-translational modifications (PTMs). FragPipe-Analyst and FragPipeAnalystR are both open-source and freely available.

  9. H

    Introduction to Time Series Analysis for Hydrologic Data

    • hydroshare.org
    • hydroshare.cuahsi.org
    zip
    Updated Jan 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriela Garcia; Kateri Salk (2021). Introduction to Time Series Analysis for Hydrologic Data [Dataset]. https://www.hydroshare.org/resource/ee2a4c2151f24115a12e34d4d22d96fe
    Explore at:
    zip(1.1 MB)Available download formats
    Dataset updated
    Jan 29, 2021
    Dataset provided by
    HydroShare
    Authors
    Gabriela Garcia; Kateri Salk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 1, 1974 - Jan 27, 2021
    Area covered
    Description

    This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.

    Introduction

    Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme and lme4 R packages). We will be covering a few simple applications of time series analysis in these lessons.

    Opportunities

    Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:

    • Has there been an increasing or decreasing trend in the response variable over time?
    • Can we forecast conditions in the future?

      Challenges

    Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:

    • Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).

    • Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.

    • Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.

    • Heteroscedasticity: The variance of the time series is not constant over time.

    • Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.

      Learning Objectives

    After successfully completing this notebook, you will be able to:

    1. Choose appropriate time series analyses for trend detection and forecasting

    2. Discuss the influence of seasonality on time series analysis

    3. Interpret and communicate results of time series analyses

  10. Understanding and Managing Missing Data.pdf

    • figshare.com
    pdf
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ibrahim Denis Fofanah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.

  11. Lifestyle_and_Health_Risk_Prediction_Dataset

    • kaggle.com
    zip
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zahra Nusrat (2025). Lifestyle_and_Health_Risk_Prediction_Dataset [Dataset]. https://www.kaggle.com/datasets/zahranusrat/lifestyle-and-health-risk-prediction-dataset
    Explore at:
    zip(61147 bytes)Available download formats
    Dataset updated
    Oct 23, 2025
    Authors
    Zahra Nusrat
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🧩 About Dataset

    This dataset provides a detailed collection of information related to [your topic], offering valuable insights for data analysis, visualization, and model development. It consists of multiple features such as [list of important columns], which capture various dimensions of the subject in a structured and measurable way.

    The purpose of this dataset is to support exploratory data analysis (EDA) and predictive modeling by allowing users to identify trends, patterns, and relationships among variables. It can serve as a foundation for building machine learning models, performing statistical studies, or generating data-driven visual reports.

    Researchers, data enthusiasts, and students can use this dataset to enhance their analytical understanding, practice preprocessing techniques, and improve their ability to draw meaningful conclusions from real-world data.

    Additionally, this dataset can be explored to uncover correlations, test hypotheses, and visualize behavioral or performance patterns. Its clean structure and well-defined variables make it suitable for both beginners learning EDA and experienced professionals developing predictive insights.

  12. d

    Data to Support Stillwater Analyses

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data to Support Stillwater Analyses [Dataset]. https://catalog.data.gov/dataset/data-to-support-stillwater-analyses
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    The U.S. Geological Survey New England Water Science Center, under an interagency agreement with the Federal Emergency Management Agency, conducted frequency analyses of stillwater elevations at three National Oceanic and Atmospheric Administration coastal gages following the coastal floods of 2018. The datasets are comma-delimited files of period-of-record annual peak stillwater elevations collected at gages in Boston, Massachusetts, Portland, Maine, and Seavey Island, Maine, for analysis of annual-exceedence probabilities. The peak water-surface elevations are in feet in the North American Vertical Datum of 1988.

  13. d

    Protected Areas Database of the United States (PAD-US) 3.0 Raster Analysis

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Raster Analysis [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-raster-analysis
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class in the full geodatabase inventory (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to prioritize overlapping designations, avoiding massive overestimation in protected area statistics, and simplified by the following PAD-US attributes to support user needs for raster analysis data: Manager Type, Manager Name, Designation Type, GAP Status Code, Public Access, and State Name. The rasterization process (see processing steps below) prioritized overlapping designations previously identified (GAP_Prity field) in the Vector Analysis File (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation (e.g. GAP Status Code 1 over 2). The 30-meter Image (IMG) grid Raster Analysis Files area extents were defined by the Census state boundary file used to clip the Vector Analysis File, the data source for rasterization ("PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class from ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb"). Alaska (AK) and Hawaii (HI) raster data are separated from the contiguous U.S. (CONUS) to facilitate analyses at manageable scales. Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types (with a legal protection mechanism) represented in some manner, while work continues to maintain updates, improve data quality, and integrate new data as it becomes available (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, protection status represents a point-in-time and changes in status between versions of PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  14. Social Media PII Disclosure Analyses

    • kaggle.com
    zip
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eidan Rosado (2024). Social Media PII Disclosure Analyses [Dataset]. https://www.kaggle.com/datasets/edyvision/social-media-pii-disclosure-analyses
    Explore at:
    zip(29813203 bytes)Available download formats
    Dataset updated
    Jul 30, 2024
    Authors
    Eidan Rosado
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Privacy vs. Social Capital: Social Media PII Disclosure Analyses

    This data was collected and analyzed as part of a study on PII disclosures in social media conversations with special attention to influencer characteristics in the interactions in the dissertation titled Privacy vs. Social Capital: Examining Information Disclosure Patterns within Social Media Influencer Networks and the research paper titled Unveiling Influencer-Driven Personal Data Sharing in Social Media Discourse.

    Each study phase is different, with X (Twitter) data used in the pilot analysis and Reddit data used in the main study. Both folders will have the analyzed_posts and cluster summary csv files broken down by collection (either based on trend or collection date).

    Note: Raw data is not made available in these datasets due to the nature of the study and to protect the original authors.

    Notable Data Elements

    Post Data

    Column nameTypeDescription
    Node IDUUIDUnique identifier for post (replaces original platform identifier)
    User IDUUIDUnique identifier assigned for user (replaces original platform identifier)
    Cluster NameStrComposite ID for subgraph using collection name and subgraph index
    Influence PowerFloatEigenvector centrality
    Influencer TierStrCategorical label calculated by follower count
    Collection NameStrTrend collection assigned based on search query
    HashtagsSet(str)The set of hashtags included in the node
    PII DisclosedBoolWhether or not PII was disclosed
    PII DetectedSet(str)The detected token types in post
    PII Risk ScoreFloatThe PII score for all tokens in a post
    Is CommentBoolWhether or not the post is a comment or reply
    Is Text StarterBoolWhether or not the post has text content
    CommunityStrThe group, community, channel, etc. associated with
    TimestampTimestampCreation timestamp (provided by social media API)
    Time ElapsedIntTime elapsed (seconds) from original influencer’s post

    Cluster Data

    Column NameTypeDescription
    Cluster NameStrComposite ID for subgraph using collection name and subgraph index
    Influencer Tiers FrequenciesList[dict]Frequency of influencer tiers of all users in the cluster
    Top Influence Power ScoreFloatEigenvector centrality of top influencer
    Top Influencer TierStrSize tier of top influencer
    Collection NameStrTrend collection assigned based on search query.
    HashtagsSet(str)The set of hashtags included in the cluster
    PII Detection FrequenciesList[dict]The detected token types in post with frequencies
    Node CountIntCount of all nodes in the influencer cluster
    Node DisclosuresIntCount of all nodes with mean_risk_score > 1*
    Disclosure RatioFloatSum of nodes with confirmed disclosed PII divided by overall cluster size (count of nodes in the cluster)
    Mean Risk ScoreFloatThe mean risk score for an entire network cluster
    Median Risk ScoreFloatThe median risk score for an entire network cluster
    Min Risk ScoreFloatThe min risk score for an entire network cluster
    Max Risk ScoreFloatThe max risk score for an entire network cluster
    Time SpanFloatTotal Time Elapsed
  15. f

    Data from: Faculty self-reported use of quantitative and data analysis...

    • figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rory R. McFadden; Karen Viskupic; Anne E. Egger (2023). Faculty self-reported use of quantitative and data analysis skills in undergraduate geoscience courses [Dataset]. http://doi.org/10.6084/m9.figshare.11409810.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Rory R. McFadden; Karen Viskupic; Anne E. Egger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quantitative literacy is a foundational component of success in STEM disciplines and in life. Quantitative concepts and data-rich activities in undergraduate geoscience courses can strengthen geoscience majors’ understanding of geologic phenomena and prepare them for future careers and graduate school, and provide real-world context to apply quantitative thinking for non-STEM students. We use self-reported teaching practices from the 2016 National Geoscience Faculty Survey to document the extent to which undergraduate geoscience instructors emphasize quantitative skills (algebra, statistics, and calculus) and data analysis skills in introductory (n = 1096) and majors (n = 1066) courses. Respondents who spent more than 20% of class time on student activities, questions, and discussions, taught small classes, or engaged more with the geoscience community through research or improving teaching incorporated statistical analyses and data analyses more frequently in their courses. Respondents from baccalaureate institutions reported use of a wider variety of data analysis skills in all courses compared with respondents from other types of institutions. Additionally, respondents who reported using more data analysis skills in their courses also used a broader array of strategies to prepare students for the geoscience workforce. These correlations suggest that targeted professional development could increase instructors’ use of quantitative and data analysis skills to meet the needs of their students in context.

  16. g

    PISA 2003 Data Analysis Manual SAS

    • gimi9.com
    • s.cnmilf.com
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PISA 2003 Data Analysis Manual SAS [Dataset]. https://gimi9.com/dataset/data-gov_pisa-2003-data-analysis-manual-sas
    Explore at:
    Description

    This publication provides all the information required to understand the PISA 2003 educational performance database and perform analyses in accordance with the complex methodologies used to collect and process the data. It enables researchers to both reproduce the initial results and to undertake further analyses. The publication includes introductory chapters explaining the statistical theories and concepts required to analyse the PISA data, including full chapters on how to apply replicate weights and undertake analyses using plausible values; worked examples providing full syntax in SAS®; and a comprehensive description of the OECD PISA 2003 international database. The PISA 2003 database includes micro-level data on student educational performance for 41 countries collected in 2003, together with students’ responses to the PISA 2003 questionnaires and the test questions. A similar manual is available for SPSS users.

  17. Record High Temperatures for US Cities

    • kaggle.com
    zip
    Updated Jan 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Record High Temperatures for US Cities [Dataset]. https://www.kaggle.com/datasets/thedevastator/record-high-temperatures-for-us-cities-in-2015
    Explore at:
    zip(9955 bytes)Available download formats
    Dataset updated
    Jan 18, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    Record High Temperatures for US Cities

    Clearly Defined Monthly Data

    By Gary Hoover [source]

    About this dataset

    This dataset contains all the record-breaking temperatures for your favorite US cities in 2015. With this information, you can prepare for any unexpected weather that may come your way in the future, or just revel in the beauty of these high heat spells from days past! With record highs spanning from January to December, stay warm (or cool) with these handy historical temperature data points

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains the record high temperatures for various US cities during the year of 2015. The dataset includes columns for each individual month, along with column for the records highs over the entire year. This data is sourced from www.weatherbase.com and can be used to analyze which cities experienced hot summers, or compare temperature variations between different regions.

    Here are some useful tips on how to work with this dataset: - Analyze individual monthly temperatures - this dataset allows you to compare high temperatures across months and locations in order to identify which areas experienced particularly hot summers or colder winters.
    - Compare annual versus monthly data - use this data to compare average annual highs against monthly highs in order to understand temperature trends at a given location throughout all four seasons of a single year, or explore how different regions vary based on yearly weather patterns as well as across given months within any one year; - Heatmap analysis - use this data plot temperature information in an interactive heatmap format in order to pinpoint particular regions that experience unique weather conditions or higher-than-average levels of warmth compared against cooler pockets of similar size geographic areas; - Statistically model the relationships between independent variables (temperature variations by month, region/city and more!) and dependent variables (e.g., tourism volumes). Use regression techniques such as linear models (OLS), ARIMA models/nonlinear transformations and other methods through statistical software such as STATA or R programming language;
    - Look into climate trends over longer periods - adjust time frames included in analyses beyond 2018 when possible by expanding upon the monthly station observations already present within the study timeframe utilized here; take advantage of digitally available historical temperature readings rather than relying only upon printed reports

    With these helpful tips, you can get started analyzing record high temperatures for US cities during 2015 using our 'Record High Temperatures for US Cities' dataset!

    Research Ideas

    • Create a heat map chart of US cities representing the highest temperature on record for each city from 2015.
    • Analyze trends in monthly high temperatures in order to predict future climate shifts and weather patterns across different US cities.
    • Track and compare monthly high temperature records for all US cities to identify regional hot spots with higher than average records and potential implications for agriculture and resource management planning

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: Highest temperature on record through 2015 by US City.csv | Column name | Description | |:--------------|:--------------------------------------------------------------| | CITY | Name of the city. (String) | | JAN | Record high temperature for the month of January. (Integer) | | FEB | Record high temperature for the month of February. (Integer) | | MAR | Record high temperature for the month of March. (Integer) | | APR | Record high temperature for the month of April. (Integer) | | MAY | Record high temperature for the month of May. (Integer) | | JUN | Record high temperature for the month of June. (Integer) | | JUL | Record high temperature for the month of July. (Integer) | | AUG | Record high temperature for the month of August. (Integer) | | SEP | Record high temperature for the month of September. (Integer) | | OCT | Record high temperature for the month of October. (Integer) | | ...

  18. H

    Replication Code: What is Your Estimand? Defining the Target Quantity...

    • datasetcatalog.nlm.nih.gov
    • dataverse.harvard.edu
    Updated Nov 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lundberg, Ian; Stewart, Brandon M.; Johnson, Rebecca (2020). Replication Code: What is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory [Dataset]. http://doi.org/10.7910/DVN/ASGOVU
    Explore at:
    Dataset updated
    Nov 15, 2020
    Authors
    Lundberg, Ian; Stewart, Brandon M.; Johnson, Rebecca
    Description

    We make only one point in this article. Every quantitative study must be able to answer the question: what is your estimand? The estimand is the target quantity---the purpose of the statistical analysis. Much attention is already placed on how to do estimation; a similar degree of care should be given to defining the thing we are estimating. We advocate that authors state the central quantity of each analysis---the theoretical estimand---in precise terms that exist outside of any statistical model. In our framework, researchers do three things: (1) set a theoretical estimand, clearly connecting this quantity to theory, (2) link to an empirical estimand, which is informative about the theoretical estimand under some identification assumptions, and (3) learn from data. Adding precise estimands to research practice expands the space of theoretical questions, clarifies how evidence can speak to those questions, and unlocks new tools for estimation. By grounding all three steps in a precise statement of the target quantity, our framework connects statistical evidence to theory.

  19. Supporting Data for Method Assessment for Non-Targeted Analyses (MANTA)...

    • data.nist.gov
    • datasets.ai
    • +2more
    Updated May 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Place (2021). Supporting Data for Method Assessment for Non-Targeted Analyses (MANTA) Program: Interlaboratory Study 1 Results [Dataset]. http://doi.org/10.18434/mds2-2412
    Explore at:
    Dataset updated
    May 24, 2021
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Benjamin Place
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    Supporting data for the results of Interlaboratory 1 of the Method Assessment for Non-Targeted Analyses. The datasets include the chemical compound descriptions, laboratory mean responses, and the tools for the principal components analysis of the datasets. In addition, a Microsoft Excel file, which was given to all participants, allowed for the analysis of the metadata.

  20. What do we mean by "data" in the arts and humanities? Interview transcripts...

    • zenodo.org
    txt, zip
    Updated Jul 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Gualandi; Bianca Gualandi; Luca Pareschi; Luca Pareschi; Silvio Peroni; Silvio Peroni (2022). What do we mean by "data" in the arts and humanities? Interview transcripts (University of Bologna, FICLIT) and qualitative data coding [Dataset]. http://doi.org/10.5281/zenodo.6123290
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jul 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bianca Gualandi; Bianca Gualandi; Luca Pareschi; Luca Pareschi; Silvio Peroni; Silvio Peroni
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Bologna
    Description

    This dataset contains the anonymised transcripts of the interviews conducted between November and December 2021 at the department of Classical Philology and Italian Studies (FICLIT) at the University of Bologna. It further includes the qualitative data analysis of the interviews, carried out using a grounded theory approach and the open source software QualCoder version 2.9.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
Organization logo

Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

Explore at:
108 scholarly articles cite this dataset (View in Google Scholar)
tiffAvailable download formats
Dataset updated
Jun 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.

Search
Clear search
Close search
Google apps
Main menu