Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
Facebook
TwitterThe OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performances in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database and worked examples providing full syntax in SPSS.
Facebook
TwitterThe objective of this study was to identify the patterns of juvenile salmonid distribution and relative abundance in relation to habitat correlates. It is the first dataset of its kind because the entire river was snorkeled by one person in multiple years. During two consecutive summers, we completed a census of juvenile salmonids and stream habitat across a stream network. We used the data to test the ability of habitat models to explain the distribution of juvenile coho salmon (Oncorhynchus kisutch), young-of-the-year (age 0) steelhead (Oncorhynchus mykiss), and steelhead parr (= age 1) for a network consisting of several different sized streams. Our network-scale models, which included five stream habitat variables, explained 27%, 11%, and 19% of the variation in the density of juvenile coho salmon, age 0 steelhead, and steelhead parr, respectively. We found weak to strong levels of spatial auto-correlation in the model residuals (Moran's I values ranging from 0.25 - 0.71). Explanatory power of base habitat models increased substantially and the level of spatial auto-correlation decreased with sequential inclusion of variables accounting for stream size, year, stream, and reach location. The models for specific streams underscored the variability that was implied in the network-scale models. Associations between juvenile salmonids and individual habitat variables were rarely linear and ranged from negative to positive, and the variable accounting for location of the habitat within a stream was often more important than any individual habitat variable. The limited success in predicting the summer distribution and density of juvenile coho salmon and steelhead with our network-scale models was apparently related to variation in the strength and shape of fish-habitat associations across and within streams and years. Summary of statistical analysis of the Calawah Riverscape data. NOAA was not involved and did not pay for the collection of this data. This data represents the statistical analysis carried out by Martin Liermann as a NOAA employee.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Initial data analysis checklist for data screening in longitudinal studies.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The graph shows the changes in the impact factor of ^ and its corresponding percentile for the sake of comparison with the entire literature. Impact Factor is the most common scientometric index, which is defined by the number of citations of papers in two preceding years divided by the number of papers published in those years.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).
As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.
This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.
Description of the data in this data set
PublicDataEcosystem_SLR provides the structure of the protocol
Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies
Spreadsheets #2 provides the protocol structure.
Spreadsheets #3 provides the filled protocol for relevant studies.
The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information
Descriptive Information
Article number
A study number, corresponding to the study number assigned in an Excel worksheet
Complete reference
The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.
Year of publication
The year in which the study was published.
Journal article / conference paper / book chapter
The type of the paper, i.e., journal article, conference paper, or book chapter.
Journal / conference / book
Journal article, conference, where the paper is published.
DOI / Website
A link to the website where the study can be found.
Number of words
A number of words of the study.
Number of citations in Scopus and WoS
The number of citations of the paper in Scopus and WoS digital libraries.
Availability in Open Access
Availability of a study in the Open Access or Free / Full Access.
Keywords
Keywords of the paper as indicated by the authors (in the paper).
Relevance for our study (high / medium / low)
What is the relevance level of the paper for our study
Approach- and research design-related information
Approach- and research design-related information
Objective / Aim / Goal / Purpose & Research Questions
The research objective and established RQs.
Research method (including unit of analysis)
The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.
Study’s contributions
The study’s contribution as defined by the authors
Qualitative / quantitative / mixed method
Whether the study uses a qualitative, quantitative, or mixed methods approach?
Availability of the underlying research data
Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?
Period under investigation
Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)
Use of theory / theoretical concepts / approaches? If yes, specify them
Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).
Quality-related information
Quality concerns
Whether there are any quality concerns (e.g., limited information about the research methods used)?
Public Data Ecosystem-related information
Public data ecosystem definition
How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?
Public data ecosystem evolution / development
Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?
What constitutes a public data ecosystem?
What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).
Components and relationships
What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).
Stakeholders
What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?
Actors and their roles
What actors does the public data ecosystem involve? What are their roles?
Data (data types, data dynamism, data categories etc.)
What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.
Processes / activities / dimensions, data lifecycle phases
What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?
Level (if relevant)
What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).
Other elements or relationships (if any)
What other elements or relationships does the public data ecosystem consist of?
Additional comments
Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).
New papers
Does the study refer to any other potentially relevant papers?
Additional references to potentially relevant papers that were found in the analysed paper (snowballing).
Format of the file.xls, .csv (for the first spreadsheet only), .docx
Licenses or restrictionsCC-BY
For more info, see README.txt
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Neural decoding is a powerful method to analyze neural activity. However, the code needed to run a decoding analysis can be complex, which can present a barrier to using the method. In this paper we introduce a package that makes it easy to perform decoding analyses in the R programing language. We describe how the package is designed in a modular fashion which allows researchers to easily implement a range of different analyses. We also discuss how to format data to be able to use the package, and we give two examples of how to use the package to analyze real data. We believe that this package, combined with the rich data analysis ecosystem in R, will make it significantly easier for researchers to create reproducible decoding analyses, which should help increase the pace of neuroscience discoveries.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The FragPipe computational proteomics platform is gaining widespread popularity among the proteomics research community because of its fast processing speed and user-friendly graphical interface. Although FragPipe produces well-formatted output tables that are ready for analysis, there is still a need for an easy-to-use and user-friendly downstream statistical analysis and visualization tool. FragPipe-Analyst addresses this need by providing an R shiny web server to assist FragPipe users in conducting downstream analyses of the resulting quantitative proteomics data. It supports major quantification workflows, including label-free quantification, tandem mass tags, and data-independent acquisition. FragPipe-Analyst offers a range of useful functionalities, such as various missing value imputation options, data quality control, unsupervised clustering, differential expression (DE) analysis using Limma, and gene ontology and pathway enrichment analysis using Enrichr. To support advanced analysis and customized visualizations, we also developed FragPipeAnalystR, an R package encompassing all FragPipe-Analyst functionalities that is extended to support site-specific analysis of post-translational modifications (PTMs). FragPipe-Analyst and FragPipeAnalystR are both open-source and freely available.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.
Introduction
Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme and lme4 R packages). We will be covering a few simple applications of time series analysis in these lessons.
Opportunities
Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:
Can we forecast conditions in the future?
Challenges
Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:
Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).
Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.
Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.
Heteroscedasticity: The variance of the time series is not constant over time.
Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.
Learning Objectives
After successfully completing this notebook, you will be able to:
Choose appropriate time series analyses for trend detection and forecasting
Discuss the influence of seasonality on time series analysis
Interpret and communicate results of time series analyses
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a detailed collection of information related to [your topic], offering valuable insights for data analysis, visualization, and model development. It consists of multiple features such as [list of important columns], which capture various dimensions of the subject in a structured and measurable way.
The purpose of this dataset is to support exploratory data analysis (EDA) and predictive modeling by allowing users to identify trends, patterns, and relationships among variables. It can serve as a foundation for building machine learning models, performing statistical studies, or generating data-driven visual reports.
Researchers, data enthusiasts, and students can use this dataset to enhance their analytical understanding, practice preprocessing techniques, and improve their ability to draw meaningful conclusions from real-world data.
Additionally, this dataset can be explored to uncover correlations, test hypotheses, and visualize behavioral or performance patterns. Its clean structure and well-defined variables make it suitable for both beginners learning EDA and experienced professionals developing predictive insights.
Facebook
TwitterThe U.S. Geological Survey New England Water Science Center, under an interagency agreement with the Federal Emergency Management Agency, conducted frequency analyses of stillwater elevations at three National Oceanic and Atmospheric Administration coastal gages following the coastal floods of 2018. The datasets are comma-delimited files of period-of-record annual peak stillwater elevations collected at gages in Boston, Massachusetts, Portland, Maine, and Seavey Island, Maine, for analysis of annual-exceedence probabilities. The peak water-surface elevations are in feet in the North American Vertical Datum of 1988.
Facebook
TwitterSpatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class in the full geodatabase inventory (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to prioritize overlapping designations, avoiding massive overestimation in protected area statistics, and simplified by the following PAD-US attributes to support user needs for raster analysis data: Manager Type, Manager Name, Designation Type, GAP Status Code, Public Access, and State Name. The rasterization process (see processing steps below) prioritized overlapping designations previously identified (GAP_Prity field) in the Vector Analysis File (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation (e.g. GAP Status Code 1 over 2). The 30-meter Image (IMG) grid Raster Analysis Files area extents were defined by the Census state boundary file used to clip the Vector Analysis File, the data source for rasterization ("PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class from ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb"). Alaska (AK) and Hawaii (HI) raster data are separated from the contiguous U.S. (CONUS) to facilitate analyses at manageable scales. Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types (with a legal protection mechanism) represented in some manner, while work continues to maintain updates, improve data quality, and integrate new data as it becomes available (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, protection status represents a point-in-time and changes in status between versions of PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data was collected and analyzed as part of a study on PII disclosures in social media conversations with special attention to influencer characteristics in the interactions in the dissertation titled Privacy vs. Social Capital: Examining Information Disclosure Patterns within Social Media Influencer Networks and the research paper titled Unveiling Influencer-Driven Personal Data Sharing in Social Media Discourse.
Each study phase is different, with X (Twitter) data used in the pilot analysis and Reddit data used in the main study. Both folders will have the analyzed_posts and cluster summary csv files broken down by collection (either based on trend or collection date).
Note: Raw data is not made available in these datasets due to the nature of the study and to protect the original authors.
| Column name | Type | Description |
|---|---|---|
| Node ID | UUID | Unique identifier for post (replaces original platform identifier) |
| User ID | UUID | Unique identifier assigned for user (replaces original platform identifier) |
| Cluster Name | Str | Composite ID for subgraph using collection name and subgraph index |
| Influence Power | Float | Eigenvector centrality |
| Influencer Tier | Str | Categorical label calculated by follower count |
| Collection Name | Str | Trend collection assigned based on search query |
| Hashtags | Set(str) | The set of hashtags included in the node |
| PII Disclosed | Bool | Whether or not PII was disclosed |
| PII Detected | Set(str) | The detected token types in post |
| PII Risk Score | Float | The PII score for all tokens in a post |
| Is Comment | Bool | Whether or not the post is a comment or reply |
| Is Text Starter | Bool | Whether or not the post has text content |
| Community | Str | The group, community, channel, etc. associated with |
| Timestamp | Timestamp | Creation timestamp (provided by social media API) |
| Time Elapsed | Int | Time elapsed (seconds) from original influencer’s post |
| Column Name | Type | Description |
|---|---|---|
| Cluster Name | Str | Composite ID for subgraph using collection name and subgraph index |
| Influencer Tiers Frequencies | List[dict] | Frequency of influencer tiers of all users in the cluster |
| Top Influence Power Score | Float | Eigenvector centrality of top influencer |
| Top Influencer Tier | Str | Size tier of top influencer |
| Collection Name | Str | Trend collection assigned based on search query. |
| Hashtags | Set(str) | The set of hashtags included in the cluster |
| PII Detection Frequencies | List[dict] | The detected token types in post with frequencies |
| Node Count | Int | Count of all nodes in the influencer cluster |
| Node Disclosures | Int | Count of all nodes with mean_risk_score > 1* |
| Disclosure Ratio | Float | Sum of nodes with confirmed disclosed PII divided by overall cluster size (count of nodes in the cluster) |
| Mean Risk Score | Float | The mean risk score for an entire network cluster |
| Median Risk Score | Float | The median risk score for an entire network cluster |
| Min Risk Score | Float | The min risk score for an entire network cluster |
| Max Risk Score | Float | The max risk score for an entire network cluster |
| Time Span | Float | Total Time Elapsed |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantitative literacy is a foundational component of success in STEM disciplines and in life. Quantitative concepts and data-rich activities in undergraduate geoscience courses can strengthen geoscience majors’ understanding of geologic phenomena and prepare them for future careers and graduate school, and provide real-world context to apply quantitative thinking for non-STEM students. We use self-reported teaching practices from the 2016 National Geoscience Faculty Survey to document the extent to which undergraduate geoscience instructors emphasize quantitative skills (algebra, statistics, and calculus) and data analysis skills in introductory (n = 1096) and majors (n = 1066) courses. Respondents who spent more than 20% of class time on student activities, questions, and discussions, taught small classes, or engaged more with the geoscience community through research or improving teaching incorporated statistical analyses and data analyses more frequently in their courses. Respondents from baccalaureate institutions reported use of a wider variety of data analysis skills in all courses compared with respondents from other types of institutions. Additionally, respondents who reported using more data analysis skills in their courses also used a broader array of strategies to prepare students for the geoscience workforce. These correlations suggest that targeted professional development could increase instructors’ use of quantitative and data analysis skills to meet the needs of their students in context.
Facebook
TwitterThis publication provides all the information required to understand the PISA 2003 educational performance database and perform analyses in accordance with the complex methodologies used to collect and process the data. It enables researchers to both reproduce the initial results and to undertake further analyses. The publication includes introductory chapters explaining the statistical theories and concepts required to analyse the PISA data, including full chapters on how to apply replicate weights and undertake analyses using plausible values; worked examples providing full syntax in SAS®; and a comprehensive description of the OECD PISA 2003 international database. The PISA 2003 database includes micro-level data on student educational performance for 41 countries collected in 2003, together with students’ responses to the PISA 2003 questionnaires and the test questions. A similar manual is available for SPSS users.
Facebook
TwitterBy Gary Hoover [source]
This dataset contains all the record-breaking temperatures for your favorite US cities in 2015. With this information, you can prepare for any unexpected weather that may come your way in the future, or just revel in the beauty of these high heat spells from days past! With record highs spanning from January to December, stay warm (or cool) with these handy historical temperature data points
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains the record high temperatures for various US cities during the year of 2015. The dataset includes columns for each individual month, along with column for the records highs over the entire year. This data is sourced from www.weatherbase.com and can be used to analyze which cities experienced hot summers, or compare temperature variations between different regions.
Here are some useful tips on how to work with this dataset: - Analyze individual monthly temperatures - this dataset allows you to compare high temperatures across months and locations in order to identify which areas experienced particularly hot summers or colder winters.
- Compare annual versus monthly data - use this data to compare average annual highs against monthly highs in order to understand temperature trends at a given location throughout all four seasons of a single year, or explore how different regions vary based on yearly weather patterns as well as across given months within any one year; - Heatmap analysis - use this data plot temperature information in an interactive heatmap format in order to pinpoint particular regions that experience unique weather conditions or higher-than-average levels of warmth compared against cooler pockets of similar size geographic areas; - Statistically model the relationships between independent variables (temperature variations by month, region/city and more!) and dependent variables (e.g., tourism volumes). Use regression techniques such as linear models (OLS), ARIMA models/nonlinear transformations and other methods through statistical software such as STATA or R programming language;
- Look into climate trends over longer periods - adjust time frames included in analyses beyond 2018 when possible by expanding upon the monthly station observations already present within the study timeframe utilized here; take advantage of digitally available historical temperature readings rather than relying only upon printed reportsWith these helpful tips, you can get started analyzing record high temperatures for US cities during 2015 using our 'Record High Temperatures for US Cities' dataset!
- Create a heat map chart of US cities representing the highest temperature on record for each city from 2015.
- Analyze trends in monthly high temperatures in order to predict future climate shifts and weather patterns across different US cities.
- Track and compare monthly high temperature records for all US cities to identify regional hot spots with higher than average records and potential implications for agriculture and resource management planning
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: Highest temperature on record through 2015 by US City.csv | Column name | Description | |:--------------|:--------------------------------------------------------------| | CITY | Name of the city. (String) | | JAN | Record high temperature for the month of January. (Integer) | | FEB | Record high temperature for the month of February. (Integer) | | MAR | Record high temperature for the month of March. (Integer) | | APR | Record high temperature for the month of April. (Integer) | | MAY | Record high temperature for the month of May. (Integer) | | JUN | Record high temperature for the month of June. (Integer) | | JUL | Record high temperature for the month of July. (Integer) | | AUG | Record high temperature for the month of August. (Integer) | | SEP | Record high temperature for the month of September. (Integer) | | OCT | Record high temperature for the month of October. (Integer) | | ...
Facebook
TwitterWe make only one point in this article. Every quantitative study must be able to answer the question: what is your estimand? The estimand is the target quantity---the purpose of the statistical analysis. Much attention is already placed on how to do estimation; a similar degree of care should be given to defining the thing we are estimating. We advocate that authors state the central quantity of each analysis---the theoretical estimand---in precise terms that exist outside of any statistical model. In our framework, researchers do three things: (1) set a theoretical estimand, clearly connecting this quantity to theory, (2) link to an empirical estimand, which is informative about the theoretical estimand under some identification assumptions, and (3) learn from data. Adding precise estimands to research practice expands the space of theoretical questions, clarifies how evidence can speak to those questions, and unlocks new tools for estimation. By grounding all three steps in a precise statement of the target quantity, our framework connects statistical evidence to theory.
Facebook
Twitterhttps://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Supporting data for the results of Interlaboratory 1 of the Method Assessment for Non-Targeted Analyses. The datasets include the chemical compound descriptions, laboratory mean responses, and the tools for the principal components analysis of the datasets. In addition, a Microsoft Excel file, which was given to all participants, allowed for the analysis of the metadata.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the anonymised transcripts of the interviews conducted between November and December 2021 at the department of Classical Philology and Italian Studies (FICLIT) at the University of Bologna. It further includes the qualitative data analysis of the interviews, carried out using a grounded theory approach and the open source software QualCoder version 2.9.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.