Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political science. If these data would otherwise be normally distributed, their spread can be conveniently represented by a multivariate normal distribution truncated to the non-negative space under a unit simplex. Here this distribution is called the simplex-truncated multivariate normal distribution. For calculations on truncated distributions, it is often useful to obtain rapid estimates of their integral, mean and covariance; these quantities characterising the truncated distribution will generally possess different values to the corresponding non-truncated distribution.
In the paper Adams, Matthew (2022) Integral, mean and covariance of the simplex-truncated multivariate normal distribution. PLoS One, 17(7), Article number: e0272014. https://eprints.qut.edu.au/233964/, three different approaches that can estimate the integral, mean and covariance of any simplex-truncated multivariate normal distribution are described and compared. These three approaches are (1) naive rejection sampling, (2) a method described by Gessner et al. that unifies subset simulation and the Holmes-Diaconis-Ross algorithm with an analytical version of elliptical slice sampling, and (3) a semi-analytical method that expresses the integral, mean and covariance in terms of integrals of hyperrectangularly-truncated multivariate normal distributions, the latter of which are readily computed in modern mathematical and statistical packages. Strong agreement is demonstrated between all three approaches, but the most computationally efficient approach depends strongly both on implementation details and the dimension of the simplex-truncated multivariate normal distribution.
This dataset consists of all code and results for the associated article.
Facebook
TwitterIn the first half of 2024, healthcare providers reported *** data breaches in the U.S. healthcare sector, becoming the entity with the highest number of reported breach incidents. As of the time of the reporting, business associates ranked second with the number of reported data breaches.
Facebook
TwitterPercentage distribution of university expenditures, by type of expenditure, Canada and provinces. This table is included in Section B: Financing education systems: Public and private expenditure on education of the Pan Canadian Education Indicators Program (PCEIP). PCEIP draws from a wide variety of data sources to provide information on the school-age population, elementary, secondary and postsecondary education, transitions, and labour market outcomes. The program presents indicators for all of Canada, the provinces, the territories, as well as selected international comparisons and comparisons over time. PCEIP is an ongoing initiative of the Canadian Education Statistics Council, a partnership between Statistics Canada and the Council of Ministers of Education, Canada that provides a set of statistical measures on education systems in Canada.
Facebook
TwitterBetween May 2023 and April 2024, more than 81 percent of the reports about personal data security breaches were about unintentional incidents. Malicious activities and access abuse ranked second, with around four percent of the reported incidents, followed by eavesdropping or interception.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models.MethodsThe four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.ResultsThe results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect.ConclusionWe recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.
Facebook
TwitterThe eight main blood types are A+, A-, B+, B-, O+, O-, AB+, and AB-. The most common blood type in the United States is O-positive, with around 38 percent of the population having this type of blood. However, blood type O-positive is more common in Latino-Americans than other ethnicities, with around 53 percent of Latino-Americans with this blood type, compared to 47 percent of African Americans and 37 percent of Caucasians. Blood donation The American Red Cross estimates that every two seconds someone in the United States needs blood or platelets, highlighting the importance of blood donation. It was estimated that in 2021, around 6.5 million people in the U.S. donated blood, with around 1.7 million of these people donating for the first time. Those with blood type O-negative are universal blood donors, meaning their blood can be transfused for any blood type. Therefore, this blood type is the most requested by hospitals. However, only about seven percent of the U.S. population has this blood type. Blood transfusion Blood transfusion is a routine procedure that involves adding donated blood to a patient’s body. There are many reasons why a patient may need a blood transfusion, including surgery, cancer treatment, severe injury, or chronic illness. In 2021, there were around 10.76 million blood transfusions in the United States. Most blood transfusions in the United States occur in an inpatient medicine setting, while critical care accounts for the second highest number of transfusions.
Facebook
TwitterThe distribution of sales by type of service provided for specialized design services based on the North American Industry Classification System (NAICS) which include all members under Distribution of sales by type of service provided, for Specialized design services (NAICS 5414), annual (percent) for three years of data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Objective: The objective of this study was to leverage a state health department's operational data to allocate in-kind resources (children's car seats) to counties, with the proposition that need-based allocation could ultimately improve public health outcomes. Methods: This study used a retrospective analysis of administrative data on car seats distributed to counties statewide by the Georgia Department of Public Health and development of a need-based allocation tool (presented as interactive supplemental digital content, adaptable to other types of in-kind public health resources) that relies on current county-level injury and sociodemographic data. Results: Car seat allocation using public health data and a need-based formula resulted in substantially different recommended allocations to individual counties compared to historic distribution. Conclusions: Results indicate that making an in-kind public health resource like car seats universally available results in a less equitable distribution of that resource compared to deliberate allocation according to public health need. Public health agencies can use local data to allocate in-kind resources consistent with health objectives; that is, in a manner offering the greatest potential health impact. Future analysis can determine whether the change to a more equitable allocation of resources is also more efficient, resulting in measurably improved public health outcomes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
Validation set
Model
True
False
Presence
A
B
Background
C
D
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Type I errors for log-normally distributed data.
Facebook
Twitterhttps://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal
Statistics on Global Value Chains: Percentage distribution of companies that outsource or considered doing so, due to barriers encountered and degree of importance. Triennial. National.
Facebook
TwitterThe common tern is a Holarctic colonially breeding and migratory seabird (Becker and Ludwigs 2004). The data we present here come from a long-term study population located in the Banter See at Wilhelmshaven on the German North Sea coast (53°36’N, 08°06’E). In 1992, 101 adult birds of this population were caught and marked with transponders (TROVAN ID 100; TROVAN, Köln, Germany), and since 1992 all locally hatched birds have similarly been marked with a transponder shortly prior to fledging.
The colony site consists of a line of six concrete islands (denoted A to F, land to lakeward; Becker 2015), each of which measures 10.7 x 4.6 m, is homogeneously covered with gravel, and is surrounded by a 0.6 m wall. Despite the distance between adjacent islands only being 0.9 m, they can be considered functional sub-colonies (Dittmann et al. 2007, Becker 2015). Three-times-weekly checks of the six sub-colonies are used to mark each nest, to assess laying date and to record reproductive parameters....
Facebook
TwitterParticipation Metrics: Total comments, unique submitters, diversity of submitter types, geographic distribution Substance Metrics: Proportion of comments with citations, data attachments, methodological detail Channel Performance: Landing page traffic, referral sources, accessibility requests fulfilled, webinar attendance Documentation: Archive of materials, distribution lists, event summaries, communications timeline
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of direct taxes paid by households as a percentage of their gross income by household type - experimental statistics
Facebook
TwitterHong Kong Innovation Activities Statistics - Table 710-86210 : Distribution of business establishments having collaboration arrangements on innovation activities (excluding R&D activities) with other organisations by type of collaborated organisation
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
Aggregate tax income data for various types of income, single-item distribution of the number and amount, statistics sheet of declaration by county and city. Unit: %
Facebook
Twitterhttps://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
This public data captures the current status of manufacturing in Geumcheon-gu. It includes basic information such as each manufacturer's name, factory address, and industry type. This data allows for an understanding of the distribution and industry-specific characteristics of manufacturers in Geumcheon-gu, serving as a crucial foundation for industrial structure analysis, business support policy development, and regional economic revitalization initiatives. This data provides startup and investment information and serves as a reference for a variety of areas, including industrial complex development and job creation. It also provides a valuable overview of regional manufacturing trends.
Facebook
TwitterIn 2023, physical products were the most sold e-commerce products in Finland with ** percent, according to the respondents of the survey. Services made up only ** percent of e-commerce sales and digital products accounted for a further **** percent.
Facebook
TwitterThis web map shows the Population distribution by quarter type in 2016 within the 18 districts of Hong Kong. It is a subset of the census data 2016 made available by the Census and Statistics Department under the Government of Hong Kong Special Administrative Region (the “Government”) at https://DATA.GOV.HK/ (“DATA.GOV.HK”). The source data is in XLSX format and has been processed and converted into Esri File Geodatabase format and then uploaded to Esri’s ArcGIS Online platform for sharing and reference purpose. The objectives are to facilitate our Hong Kong ArcGIS Online users to use the data in a spatial ready format and save their data conversion effort.For details about the data, source format and terms of conditions of usage, please refer to the website of DATA.GOV.HK at https://data.gov.hk.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.