Objective To consider the problem of the calculation of number needed to treat (NNT) derived from risk difference, odds ratio, and raw pooled events shown to give different results using data from a review of nursing interventions for smoking cessation.
Discussion
A review of nursing interventions for smoking cessation from the Cochrane Library provided different values for NNT depending on how NNTs were calculated. The Cochrane review was evaluated for clinical heterogeneity using L'Abbé plot and subsequent analysis by secondary and primary care settings.
Three studies in primary care had low (4%) baseline quit rates, and nursing interventions were without effect. Seven trials in hospital settings with patients after cardiac surgery, or heart attack, or even with cancer, had high baseline quit rates (25%). Nursing intervention to stop smoking in the hospital setting was effective, with an NNT of 14 (95% confidence interval 9 to 26). The assumptions involved in using risk difference and odds ratio scales for calculating NNTs are discussed.
Summary
Clinical common sense and concentration on raw data helps to detect clinical heterogeneity. Once robust statistical tests have told us that an intervention works, we then need to know how well it works. The number needed to treat or harm is just one way of showing that, and when used sensibly can be a useful tool.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
Project description
The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource… See the full description on the dataset page: https://huggingface.co/datasets/datajuicer/data-juicer-t2v-optimal-data-pool.
Animals were incorporated into pools in different proportions to estimate error and evaluate factors influencing error. Animals were incorporated into 2 types of pools, sub-pools and super pools. Within phenotype, liver abscess or normal, 16 animals were combined into 4 sub-pools, 4 animals per sub-pool in parts of 1:2:3:4. Sub-pools were constructed based on crushed frozen liver tissue mass. Within phenotype, 4 sub-pools were incorporated into 2 super pools in parts of 1:2:3:4 for super pool 1 and 3:4:1:2 for super pool 2. Super pools were made based on DNA quantity. Errors in DNA quantification would create error in forming super pools from sub-pools and variation in cell content or DNA content of liver tissue would result in error in combining sub-pools from animals. Animal contributions to sub-pools for livers with abscess sub-pool 1A was 1:2:3:4 parts of 15A, 36A, 35A, and 23A. sub-pool 2A was 1:2:3:4 parts of 42A, 37A, 12A, and 22A. sub-pool 3A was 1:2:3:4 parts of 17A, 1A, 49A, and 48A . sub-pool 4A was 1:2:3:4 parts of 3A, 20A, 16A, and 13A. Each part was 0.1 g of pulverized frozen liver tissue. Animal contributions to livers without abscess sub-pool 1N was 1:2:3:4 parts of 46N, 23N, 17N, and 12N. sub-pool 2N was 1:2:3:4 parts of 1N, 31N, 6N, and 48N. sub-pool 3N was 1:2:3:4 parts of 36N, 43N, 32N, and 13N. sub-pool 4N was 1:2:3:4 parts of 34N, 19N, 41N, and 50N. Sub-pool contributions to super pools for livers with abscess super pool 1A was:1:2:3:4 parts sub-pool 1A, sub-pool 2A, sub-pool 3A, and sub-pool 4A. super pool 2A was 3:4:1:2 parts sub-pool 1A, sub-pool 2A, sub-pool 3A, and sub-pool 4A. Sub-pool contributions to super pools for livers with without abscess super pool 1N was:1:2:3:4 parts sub-pool 1N, sub-pool 2N, sub-pool 3N, and sub-pool 4N. super pool 2N was 3:4:1:2 parts sub-pool 1N, sub-pool 2N, sub-pool 3N, and sub-pool 4N. Funded by the USDA Agricultural Research Service, Developing a Systems Biology Approach to Enhance Efficiency and Sustainability of Beef and Lamb Production/ 3040-31000-100-000-D Resources in this dataset:Resource Title: xy data for individual animals. File Name: xyIndividuals.csv.gzResource Description: X (red) and Y (green) intensity data for 32 animals. There are 64 columns, an X and Y column for each animalResource Title: Genotypes, Number of copies of B allele for BovineHD 770K. File Name: g.csv.gzResource Description: Values are 0, 1 and 2 for 32 animals and 777,962 SNP, DNA was extracted from pulverized frozen liver tissueResource Title: x and y data for pools. File Name: xyPools.csv.gzResource Description: X (red) and Y (green) intensity for 12 pools. There are 2 columns per pool, first is X followed by Y. First 8 columns are super pools and second 16 are sub-pools. Examples superPool.1A.X is superPool 1 for abscess livers and X intensity sub-pool.1A.Y is sub-pools 1 for abscess livers and Y intensity
Standardized do files to facilitate within- and across-country data pooling and analysis
Background The "integrated safety report" of the drug registration files submitted to health authorities usually summarizes the rates of adverse events observed for a new drug, placebo or active control drugs by pooling the safety data across the trials. Pooling consists of adding the numbers of events observed in a given treatment group across the trials and dividing the results by the total number of patients included in this group. Because it considers treatment groups rather than studies, pooling ignores validity of the comparisons and is subject to a particular kind of bias, termed "Simpson's paradox." In contrast, meta-analysis and other stratified analyses are less susceptible to bias.
Methods
We use a hypothetical, but not atypical, application to demonstrate that the results of a meta-analysis can differ greatly from those obtained by pooling the same data. In our hypothetical model, a new drug is compared to 1) a placebo in 4 relatively small trials in patients at high risk for a certain adverse event and 2) an active reference drug in 2 larger trials of patients at low risk for this event.
Results
Using meta-analysis, the relative risk of experiencing the adverse event with the new drug was 1.78 (95% confidence interval [1.02; 3.12]) compared to placebo and 2.20 [0.76; 6.32] compared to active control. By pooling the data, the results were, respectively, 1.00 [0.59; 1.70] and 5.20 [2.07; 13.08].
Conclusions
Because these findings could mislead health authorities and doctors, regulatory agencies should require meta-analyses or stratified analyses of safety data in drug registration files.
Using data from an experiment conducted in 70 Colombian communities, we investigate who pools risk with whom when trust is crucial for enforcing risk pooling arrangements. We explore the roles played by risk attitudes and social networks. Both empirically and theoretically, we find that close friends and relatives group assortatively on risk attitudes and are more likely to join the same risk pooling group, while unfamiliar participants group less and rarely assort. These findings indicate that where there are advantages to grouping assortatively on risk attitudes those advantages may be inaccessible when trust is absent or low.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note:1no suramin pre-treatment;2body mass index,3cerebrospinal fluid,4white blood cell.
For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation. For variable and value labelling and coding frames that are not included either in the data or in the current APS documentation, users are advised to consult the latest versions of the LFS User Guides, which are available from the ONS Labour Force Survey - User Guidance webpages.
Occupation data for 2021 and 2022
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. The affected datasets have now been updated. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022
APS Well-Being Datasets
From 2012-2015, the ONS published separate APS datasets aimed at providing initial estimates of subjective well-being, based on the Integrated Household Survey. In 2015 these were discontinued. A separate set of well-being variables and a corresponding weighting variable have been added to the April-March APS person datasets from A11M12 onwards. Further information on the transition can be found in the Personal well-being in the UK: 2015 to 2016 article on the ONS website.
APS disability variables
Over time, there have been some updates to disability variables in the APS. An article explaining the quality assurance investigations on these variables that have been conducted so far is available on the ONS Methodology webpage.
The Secure Access data have more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.
Latest edition information
For the second edition (January 2024), a new version of the data file was deposited, with smoking variables added.
Feature layer containing Pool information in the City of Sioux Falls, South Dakota.
Distractors and responses are integrated in an event file when they occur together. Further, when all or some features repeat, the whole event file is retrieved, affecting later action as observed in so-called binding effects. Previous research used varying distractor pool sizes (ranging from just two to well over 30) to choose distractors from, but it is unclear whether distractor pool size has an effect on the size of distractor-based binding effects. The present study investigates, if and how distractor pool size modulates binding effects. Using an adapted prime-probe design, participants were assigned to large (384 distractors) or small (2 distractors) distractor pool sizes, and distractor-response binding effects were measured. Binding effects were stronger for the large distractor pool condition compared to the small pool condition. We discuss these findings against the background of the negative priming literature and research on novelty. Dataset for: Philip Schmalbrock, Christian Frings & Birte Moeller (2022) Pooling it all together – the role of distractor pool size on stimulus-response binding, Journal of Cognitive Psychology, DOI: 10.1080/20445911.2022.2026363
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Penalized regression methods are used in many biomedical applications for variable selection and simultaneous coefficient estimation. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors. This article considers a general class of penalized objective functions which, by construction, force selection of the same variables across imputed datasets. By pooling objective functions across imputations, optimization is then performed jointly over all imputed datasets rather than separately for each dataset. We consider two objective function formulations that exist in the literature, which we will refer to as “stacked” and “grouped” objective functions. Building on existing work, we (i) derive and implement efficient cyclic coordinate descent and majorization-minimization optimization algorithms for continuous and binary outcome data, (ii) incorporate adaptive shrinkage penalties, (iii) compare these methods through simulation, and (iv) develop an R package miselect. Simulations demonstrate that the “stacked” approaches are more computationally efficient and have better estimation and selection properties. We apply these methods to data from the University of Michigan ALS Patients Biorepository aiming to identify the association between environmental pollutants and ALS risk. Supplementary materials for this article are available online.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PCR-based analysis is the gold standard for detection of SARS-CoV-2 and was used broadly throughout the pandemic. However, heightened demand for testing put strain on diagnostic resources and the adequate amount of PCR-based testing required exceeded existing testing capacity. Pooled testing strategies presented an effective method to increase testing capacity by decreasing the number of tests and resources required for laboratory PCR analysis of SARS-CoV-2. We sought to conduct an analysis of SARS-CoV-2 pooling schemes to determine the sensitivity of various sized Dorfman pooling strategies and evaluate the utility of using such pooling strategies in diagnostic laboratory settings. Overall, a trend of decreasing sensitivity with larger pool sizes was observed, with modest sensitivity losses in the largest pools tested, and high sensitivity in all other pools. Efficiency data was then calculated to determine the optimal Dorfman pool sizes based on test positivity rate. This was correlated with current presumptive test positivity to maximize the number of tests saved, thereby increasing testing capacity and resource efficiency in the community setting. Dorfman pooling methods were evaluated and found to offer a high-throughput solution to SARS-CoV-2 clinical testing that improve resource efficiency in low-resource environments.
Pool information for available Multiple Issuer Pool for future pooling use
This dataset corresponds to a project investigating whether cold-air pooling influences forest composition and function. The data include hourly sub-canopy air temperatures (measured continuously via ibuttons) and forest forest composition data for 48 plots along 9 transects in 3 sites across New England, USA. The temperature data also include surface lapse rates and temperature gradients across transects, as well as a designation indicating the presence or absence of a temperature inversion. We found that sites with the most frequent temperature inversions also displayed vegetation inversions across slopes, with more cold-adapted species at low instead of high elevations.
This repository contains modified TNC trip data obtained from the Chicago data portal for the year 2019. The raw trip data is first cleaned by removing trivial and erroneous records. This includes short trips with travel times of less than 2 minutes or distances shorter than 0.1 miles. We also exclude entries with missing pickup or dropoff census tract i.e trips originating or ending outside Chicago. Lastly, we remove trips marked as not authorized as shared trips but coded as shared trips. The filtered data is then aggregated by pickup_hour, 'pickup_date, pickup_day, and pickup_month. We also aggregate by census tracts in addition to the earlier ones for detour plots.
This document states the importance of trusted, shared information to create trusted processes, informed decisions and ultimately better outcomes for watersheds in British Columbia. A variety of models for managing and supporting local water and watershed information exist in B.C. They are examples of how Crown and Indigenous governments, local communities, and industry are collaborating to “pool” and integrate various forms of water and watershed data. These initiatives recognize a critical point: that for effective watershed planning, policy/ regulation, programs, and decision-making, there needs to be agreement on how a shared foundation of credible information and knowledge is effectively being built and managed. Three places in particular demonstrate the characteristics of effective knowledge creation and sharing in action are the Columbia Basin Water Monitoring Collaborative, Skeena Knowledge Trust (SKT), and Coast Information Team (CIT).
The NYC Parks outdoor pool season typically runs from late June to the Sunday after Labor Day. During the season, Parks' staff record data via a mobile app survey at the end of each pool session. The survey includes questions on attendance, staffing, meals, issues, weather conditions, and closures for that specific session.
NYC Parks operates two sessions at each pool every day of the pool season. First Session is from 11:00am - 3:00pm. Second Session is from 4:00pm - 7:00pm, with the requirement for Olympic / Intermediate pools to stay open for Extended Second Session from 7:00pm - 8:00pm when the City Heat Emergency Plan is activated.
For each pool season, every pool will have at least two survey submissions per day - one submission for the first session, and one submission for the second session. A pool will have a third submission if it stays open for an extended second session.
Data Dictionary: https://docs.google.com/spreadsheets/d/15lHSZF76W1cZnjwlWRSn7tzLh6EqVZVeZ2vwDFqXHMM/edit?usp=sharing
For reference, pool geography from Open Data can be found here: https://data.cityofnewyork.us/City-Government/Pools/3vjv-6tf5
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This the replication data for Scale economies and decline of ride-pooling: A case study of New York City. The data has a csv with weekly aggregates for all TNC trips in NYC, starting from Feb 2019 to Mar 2023. The data about TNC trips is obtained from NYC TLC High Volume For-Hire Vehicles Trip Records. Additionally, there are hourly aggregates for 2019 and 2023, alongside hourly precipitation obtained from the Open-Meteo dataset.
The dataset and accompanying analysis scripts accompany the article "Bayesian multistate models allow incorporation of spatial dynamics to improve invasive species management". The data are summarized detections from acoustic telemetry receivers (69 KHz) from 353 silver carp (Hypophthalmichthys molitrix) and 170 bighead carp (H. nobilis) surgically implanted with transmitters in the Illinois River, USA. The analysis scripts assess probability of detection, probability of monthly movement between navigation pools on the river, probability of apparent survival, and probability of operable transmitter battery through a Bayesian multistate hidden Markov model.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
hkust-nlp/deita-redundant-pool-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Objective To consider the problem of the calculation of number needed to treat (NNT) derived from risk difference, odds ratio, and raw pooled events shown to give different results using data from a review of nursing interventions for smoking cessation.
Discussion
A review of nursing interventions for smoking cessation from the Cochrane Library provided different values for NNT depending on how NNTs were calculated. The Cochrane review was evaluated for clinical heterogeneity using L'Abbé plot and subsequent analysis by secondary and primary care settings.
Three studies in primary care had low (4%) baseline quit rates, and nursing interventions were without effect. Seven trials in hospital settings with patients after cardiac surgery, or heart attack, or even with cancer, had high baseline quit rates (25%). Nursing intervention to stop smoking in the hospital setting was effective, with an NNT of 14 (95% confidence interval 9 to 26). The assumptions involved in using risk difference and odds ratio scales for calculating NNTs are discussed.
Summary
Clinical common sense and concentration on raw data helps to detect clinical heterogeneity. Once robust statistical tests have told us that an intervention works, we then need to know how well it works. The number needed to treat or harm is just one way of showing that, and when used sensibly can be a useful tool.