Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The empirical type I error rates of eight tests for different sample sizes under nominal significance level 0.05 based on 10,000 replications.
Facebook
TwitterThe Country and Regional Analysis (CRA) presents statistical estimates for the allocation of identifiable expenditure between the regions and nations of the UK. This year’s dataset covers the outturn period 2017-18 to 2022-22.
Alongside the main CRA release, the Treasury has published further analysis tools in the form of “interactive tables” and the full CRA database. These tools will allow users to manipulate the data to create their own views. The database contains the underlying “segment” level data used to construct the published tables in CRA 2022. Figures are in nominal terms. The “interactive tables” include both nominal and real terms data, but exclude the “segment” level information.
For statistical enquiries, please contact Pesa.document@hmtreasury.gov.uk
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
By applying Differential Set Analysis (DSA) to sequence count data, researchers can determine whether groups of microbes or genes are differentially enriched. Yet sequence count data suffer from a scale limitation: these data lack information about the scale (i.e., size) of the biological system under study, leading some authors to call these data compositional (i.e., proportional). In this article, we show that commonly used DSA methods that rely on normalization make strong, implicit assumptions about the unmeasured system scale. We show that even small errors in these scale assumptions can lead to positive predictive values as low as 9%. To address this problem, we take three novel approaches. First, we introduce a sensitivity analysis framework to identify when modeling results are robust to such errors and when they are suspect. Unlike standard benchmarking studies, this framework does not require ground-truth knowledge and can therefore be applied to both simulated and real data. Second, we introduce a statistical test that provably controls Type-I error at a nominal rate despite errors in scale assumptions. Finally, we discuss how the impact of scale limitations depends on a researcher’s scientific goals and provide tools that researchers can use to evaluate whether their goals are at risk from erroneous scale assumptions. Overall, the goal of this article is to catalyze future research into the impact of scale limitations in analyses of sequence count data; to illustrate that scale limitations can lead to inferential errors in practice; yet to also show that rigorous and reproducible scale reliant inference is possible if done carefully.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
DATA EXPLORATION Understand the characteristics of given fields in the underlying data such as variable distributions, whether the dataset is skewed towards a certain demographic and the data validity of the fields. For example, a training dataset may be highly skewed towards the younger age bracket. If so, how will this impact your results when using it to predict over the remaining customer base. Identify limitations surrounding the data and gather external data which may be useful for modelling purposes. This may include bringing in ABS data at different geographic levels and creating additional features for the model. For example, the geographic remoteness of different postcodes may be used as an indicator of proximity to consider to whether a customer is in need of a bike to ride to work.
MODEL DEVELOPMENT Determine a hypothesis related to the business question that can be answered with the data. Perform statistical testing to determine if the hypothesis is valid or not. Create calculated fields based on existing data, for example, convert the D.O.B into an age bracket. Other fields that may be engineered include ‘High Margin Product’ which may be an indicator of whether the product purchased by the customer is in a high margin category in the past three months based on the fields ‘list_price’ and ‘standard cost’. Other examples include, calculating the distance from office to home address to as a factor in determining whether customers may purchase a bicycle for transportation purposes. Additionally, this may include thoughts around determining what the predicted variable actually is. For example, are results predicted in ordinal buckets, nominal, binary or continuous. Test the performance of the model using factors relevant for the given model chosen (i.e. residual deviance, AIC, ROC curves, R Squared). Appropriately document model performance, assumptions and limitations.
INTEPRETATION AND REPORTING Visualisation and presentation of findings. This may involve interpreting the significant variables and co-efficient from a business perspective. These slides should tell a compelling storing around the business issue and support your case with quantitative and qualitative observations. Please refer to module below for further details
The dataset is easy to understand and self-explanatory!
It is important to keep in mind the business context when presenting your findings: 1. What are the trends in the underlying data? 2. Which customer segment has the highest customer value? 3. What do you propose should be the marketing and growth strategy?
Facebook
TwitterThe country and regional analysis (CRA) presents statistical estimates for the allocation of identifiable expenditure between the UK countries and 9 English regions. This year’s dataset covers the outturn period 2014-15 to 2018-19.
Alongside the main CRA release, the Treasury has published further analysis tools in the form of “interactive tables” and the full CRA database. These tools will allow users to manipulate the data to create their own views. The database contains the underlying “segment” level data used to construct the published tables in CRA 2019. Figures are in nominal terms. The “interactive tables” include both nominal and real terms data, but exclude the “segment” level information.
For statistical enquiries, please contact: Pesa.document@hmtreasury.gov.uk
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models.MethodsThe four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.ResultsThe results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect.ConclusionWe recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Czech Republic Services Sales Index: NACE 2: AE: Technical Testing and Analysis data was reported at 193.718 2010=100 in Dec 2017. This records an increase from the previous number of 157.409 2010=100 for Nov 2017. Czech Republic Services Sales Index: NACE 2: AE: Technical Testing and Analysis data is updated monthly, averaging 95.807 2010=100 from Jan 2000 (Median) to Dec 2017, with 216 observations. The data reached an all-time high of 201.385 2010=100 in Dec 2008 and a record low of 39.512 2010=100 in Aug 2000. Czech Republic Services Sales Index: NACE 2: AE: Technical Testing and Analysis data remains active status in CEIC and is reported by Czech Statistical Office. The data is categorized under Global Database’s Czech Republic – Table CZ.H012: Services Sales Index: Nominal and Real: 2010=100. Rebased from 2010=100 to 2015=100 Replacement series ID: 401334557
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Czech Republic Real Services Sales Index: NACE 2: AE: Technical Testing and Analysis data was reported at 190.547 2010=100 in Dec 2017. This records an increase from the previous number of 154.676 2010=100 for Nov 2017. Czech Republic Real Services Sales Index: NACE 2: AE: Technical Testing and Analysis data is updated monthly, averaging 100.363 2010=100 from Jan 2000 (Median) to Dec 2017, with 216 observations. The data reached an all-time high of 197.513 2010=100 in Dec 2008 and a record low of 53.852 2010=100 in Aug 2000. Czech Republic Real Services Sales Index: NACE 2: AE: Technical Testing and Analysis data remains active status in CEIC and is reported by Czech Statistical Office. The data is categorized under Global Database’s Czech Republic – Table CZ.H012: Services Sales Index: Nominal and Real: 2010=100. Rebased from 2010=100 to 2015=100 Replacement series ID: 401334827
Facebook
TwitterThe country and regional analysis (CRA) presents statistical estimates for the allocation of identifiable expenditure between the UK countries and 9 English regions. This year’s dataset covers the outturn period 2015-16 to 2019-20.
Alongside the main CRA release, the Treasury has published further analysis tools in the form of “interactive tables” and the full CRA database. These tools will allow users to manipulate the data to create their own views. The database contains the underlying “segment” level data used to construct the published tables in CRA 2020. Figures are in nominal terms. The “interactive tables” include both nominal and real terms data, but exclude the “segment” level information.
For statistical enquiries, please contact: Pesa.document@hmtreasury.gov.uk
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Key Columns and Metrics:
- Country: The name of the country.
- Total in km2: Total area of the country.
- Land in km2: Land area excluding water bodies.
- Water in km2: Area covered by water bodies.
- Water %: Percentage of the total area covered by water.
- HDI: Human Development Index, a measure of a country's overall achievement in its social and economic dimensions.
- %HDI Growth: Percentage growth in HDI.
- IMF Forecast GDP(Nominal): International Monetary Fund's forecast for Gross Domestic Product in nominal terms.
- World Bank Forecast GDP(Nominal): World Bank's forecast for Gross Domestic Product in nominal terms.
- UN Forecast GDP(Nominal): United Nations' forecast for Gross Domestic Product in nominal terms.
- IMF Forecast GDP(PPP): IMF's forecast for Gross Domestic Product in purchasing power parity terms.
- World Bank Forecast GDP(PPP): World Bank's forecast for Gross Domestic Product in purchasing power parity terms.
- CIA Forecast GDP(PPP): Central Intelligence Agency's forecast for Gross Domestic Product in purchasing power parity terms.
- Internet Users: Number of internet users in the country.
- UN Continental Region: Continental region classification by the United Nations.
- UN Statistical Subregion: Statistical subregion classification by the United Nations.
- Population 2022: Population of the country in the year 2022.
- Population 2023: Population of the country in the year 2023.
- Population %Change: Percentage change in population from 2022 to 2023.
Facebook
Twitterhttps://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18419/DARUS-4231https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18419/DARUS-4231
This dataset contains the supplementary materials to our publication "Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis", where we report on a study we conducted. Please refer to publication for more details, also the abstract can be found at the end of this description. The dataset contains: The collection of graphs with layout used in the study The final, randomized experiment files used in the study The source code of the study prototype The collected, anonymized data in tabular form The code for the statistical analysis The Supplemental Materials PDF The documents used in the study procedure (English, Italian, German) Paper abstract: Problem solving is a composite cognitive process, invoking a number of cognitive mechanisms, such as perception and memory. Individuals may form collectives to solve a given problem together, in collaboration, especially when complexity is thought to be high. To determine if and when collaborative problem solving is desired, we must quantify collaboration first. For this, we investigate the practical virtue of collaborative problem solving. Using visual graph analysis, we perform a study with 72 participants in two countries and three languages. We compare ad hoc pairs to individuals and nominal pairs, solving two different tasks on graphs in visuospatial mixed reality. The average collaborating pair does not outdo its nominal counterpart, but it does have a significant trade-off against the individual: an ad hoc pair uses 1.46 more time to achieve 4.6% higher accuracy. We also use the concept of task instance complexity to quantify differences in complexity. As task instance complexity increases, these differences largely scale, though with two notable exceptions. With this study we show the importance of using nominal groups as benchmark in collaborative virtual environments research. We conclude that a mixed reality environment does not automatically imply superior collaboration.
Facebook
TwitterThe Country and Regional Analysis (CRA) presents statistical estimates for the allocation of identifiable expenditure between the regions and nations of the UK. This year’s dataset covers the outturn period 2016-17 to 2020-21.
Alongside the main CRA release, the Treasury has published further analysis tools in the form of “interactive tables” and the full CRA database. These tools will allow users to manipulate the data to create their own views. The database contains the underlying “segment” level data used to construct the published tables in CRA 2021. Figures are in nominal terms. The “interactive tables” include both nominal and real terms data, but exclude the “segment” level information.
For statistical enquiries, please contact Pesa.document@hmtreasury.gov.uk
Facebook
Twitterhttps://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Analysis Scales market plays a crucial role across various industries, providing standardized measures that enable researchers, businesses, and educators to quantify complex phenomena reliably. Analysis scales, which include nominal, ordinal, interval, and ratio scales, are utilized to assess data in fields such
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Czech Republic Real Services Sales Index: NACE 2: Architectural and Engineering Activities, Technical Testing and Analysis (AE) data was reported at 144.675 2010=100 in Dec 2017. This records an increase from the previous number of 96.429 2010=100 for Nov 2017. Czech Republic Real Services Sales Index: NACE 2: Architectural and Engineering Activities, Technical Testing and Analysis (AE) data is updated monthly, averaging 89.526 2010=100 from Jan 2000 (Median) to Dec 2017, with 216 observations. The data reached an all-time high of 234.981 2010=100 in Dec 2008 and a record low of 41.683 2010=100 in Feb 2017. Czech Republic Real Services Sales Index: NACE 2: Architectural and Engineering Activities, Technical Testing and Analysis (AE) data remains active status in CEIC and is reported by Czech Statistical Office. The data is categorized under Global Database’s Czech Republic – Table CZ.H012: Services Sales Index: Nominal and Real: 2010=100. Rebased from 2010=100 to 2015=100 Replacement series ID: 401334807
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Czech Republic Services Sales Index: NACE 2: Architectural and Engineering Activities, Technical Testing and Analysis (AE) data was reported at 147.081 2010=100 in Dec 2017. This records an increase from the previous number of 98.131 2010=100 for Nov 2017. Czech Republic Services Sales Index: NACE 2: Architectural and Engineering Activities, Technical Testing and Analysis (AE) data is updated monthly, averaging 80.919 2010=100 from Jan 2000 (Median) to Dec 2017, with 216 observations. The data reached an all-time high of 239.543 2010=100 in Dec 2008 and a record low of 41.176 2010=100 in Feb 2017. Czech Republic Services Sales Index: NACE 2: Architectural and Engineering Activities, Technical Testing and Analysis (AE) data remains active status in CEIC and is reported by Czech Statistical Office. The data is categorized under Global Database’s Czech Republic – Table CZ.H012: Services Sales Index: Nominal and Real: 2010=100. Rebased from 2010=100 to 2015=100 Replacement series ID: 401334537
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🌍 Global GDP by Country — 2024 Edition
The Global GDP by Country (2024) dataset provides an up-to-date snapshot of worldwide economic performance, summarizing each country’s nominal GDP, growth rate, population, and global economic contribution.
This dataset is ideal for economic analysis, data visualization, policy modeling, and machine learning applications related to global development and financial forecasting.
🎯 Target Use-Cases:
- Economic growth trend analysis
- GDP-based country clustering
- Per capita wealth comparison
- Share of world economy visualization
| Feature Name | Description |
|---|---|
| Country | Official country name |
| GDP (nominal, 2023) | Total nominal GDP in USD |
| GDP (abbrev.) | Simplified GDP format (e.g., “$25.46 Trillion”) |
| GDP Growth | Annual GDP growth rate (%) |
| Population 2023 | Estimated population for 2023 |
| GDP per capita | Average income per person (USD) |
| Share of World GDP | Percentage contribution to global GDP |
💰 Top Economies (Nominal GDP):
United States, China, Japan, Germany, India
📈 Fastest Growing Economies:
India, Bangladesh, Vietnam, and Rwanda
🌐 Global Insights:
- The dataset covers 181 countries representing 100% of global GDP.
- Suitable for data visualization dashboards, AI-driven economic forecasting, and educational research.
Source: Worldometers — GDP by Country (2024)
Dataset compiled and cleaned by: Asadullah Shehbaz
For open research and data analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data collected aim to test whether English proficiency levels in a country are positively associated with higher democratic values in that country. English proficiency is sourced from statistics by Education First’s "EF English Proficiency Index" which covers countries' scores for the calendar year 2022 and 2021. The EF English Proficiency Index ranks 111 countries in five different categories based on their English proficiency scores that were calculated from the test results of 2.1 million adults. While democratic values are operationalized through the liberal democracy index from the V-Dem Institute annual report for 2022 and 2021. Additionally, the data is utilized to test whether English language media consumption acts as a mediating variable between English proficiency and democracy levels in a country, while also looking at other possible regression variables. In order to conduct the linear regression analyses for the dats, the software that was utilized for this research was Microsoft Excel.The raw data set consists of 90 nation states in two years from 2022 and 2021. The raw data is utilized for two separate data sets the first of which is democracy indicators which has the regression variables of EPI, HDI, and GDP. For this table set there is a total of 360 data entries. HDI scores are a statistical summary measure that is developed by the United Nations Development Programme (UNDP) which measures the levels of human development in 190 countries. The data for nominal gross domestic product scores (GDP) are sourced from the World Bank. Having strong regression variables that have been proven to have a positive link with democracy in the data analysis such as GDP and HDI, would allow the regression analysis to identify whether there is a true relationship between English proficiency and democracy levels in a country. While the second data set has a total of 720 data entries and aims to identify English proficiency indicators the data set has 7 various regression variables which include, LDI scores, Years of Mandatory English Education, Heads of States Publicly speaking English, GDP PPP (2021USD), Common Wealth, BBC web traffic and CNN web traffic. The data for years of mandatory English education is sourced from research at the University of Winnipeg and is coded in the data set based on the number of years a country has English as a mandatory subject. The range of this data is from 0 to 13 years of English being mandatory. It is important to note that this data only concerns public schools and does not extend to the private school systems in each country. The data for heads of state publicly speaking English was done through a video data analysis of all heads of state. The data was only used for heads of state who had been in their position for at least a year to ensure the accuracy of the data collected; with a year in power, for heads of state that had not been in their position for a year, data was taken from the previous head of state. This data only takes into account speeches and interviews that were conducted during their incumbency. The data for each country’s GDP PPP scores are sourced from the World Bank, which was last updated for a majority of the countries in 2021 and is tied to the US dollar. Data for the commonwealth will only include members of the commonwealth that have been historically colonized by the United Kingdom. Any country that falls under that category will be coded as 1 and any country that does not will be coded as 0. For BBC and CNN web traffic that data is sourced by using tools in Semrush which provide a rough estimate of how much web traffic each news site generates in each country. Which will be utilized to identify the average number of web traffic for BBC News and CNN World News for both the 2021 and 2022 calendar. The traffic for each country will also be measured per capita, per 10 thousand people to ensure that the population density of a country does not influence the results. The population of each country for both 2021 and 2022 is sourced from the United Nations revision of World Population Prospects of both 2021 and 2022 respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article considers the problem of inference in observational studies with time-varying adoption of treatment. In addition to an unconfoundedness assumption that the potential outcomes are independent of the times at which units adopt treatment conditional on the units’ observed characteristics, our analysis assumes that the time at which each unit adopts treatment follows a Cox proportional hazards model. This assumption permits the time at which each unit adopts treatment to depend on the observed characteristics of the unit, but imposes the restriction that the probability of multiple units adopting treatment at the same time is zero. In this context, we study randomization tests of a null hypothesis that specifies that there is no treatment effect for all units and all time periods in a distributional sense. We first show that an infeasible test that treats the parameters of the Cox model as known has rejection probability under the null hypothesis no greater than the nominal level in finite samples. Since these parameters are unknown in practice, this result motivates a feasible test that replaces these parameters with consistent estimators. While the resulting test does not need to have the same finite-sample validity as the infeasible test, we show that it has limiting rejection probability under the null hypothesis no greater than the nominal level. In a simulation study, we examine the practical relevance of our theoretical results, including robustness to misspecification of the model for the time at which each unit adopts treatment. Finally, we provide an empirical application of our methodology using the synthetic control-based test statistic and tobacco legislation data found in Abadie, Diamond and Hainmueller. Supplementary materials for this article are available online.
Facebook
TwitterThe Country and Regional Analysis (CRA) presents statistical estimates for the allocation of identifiable expenditure between the regions and nations of the UK. This year’s dataset covers the outturn period 2019-20 to 2023-24.
Alongside the main CRA release, the Treasury has published further analysis tools in the form of “interactive tables” and the full CRA database. These tools will allow users to manipulate the data to create their own views. The database contains the underlying “segment” level data used to construct the published tables in CRA 2024. Figures are in nominal terms. The “interactive tables” include both nominal and real terms data, but exclude the “segment” level information.
For statistical enquiries, please contact: Pesa.document@hmtreasury.gov.uk
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study was approved by the local ethic committee of Ben Arous hospital, Tunisia. Written consent for participation was obtained from each patient. Data collection was conducted in compliance with Tunisian laws regarding personal data protection. Statistical analysis was performed according to the intention-to-treat principle. Differences in the baseline characteristics and the primary and secondary outcome measures between the groups were compared with the chi-squared test for nominal variables, the independent Student’s t-test for continuous variables with normal distributions, and the nonparametric Mann-Whitney U test for continuous variables with skewed distributions. To address risk factors involved in SSI development other than suture type, a bivariate analysis was first conducted on all independent variables. Any variable with a statistically significant response, or very close to it (p
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The empirical type I error rates of eight tests for different sample sizes under nominal significance level 0.05 based on 10,000 replications.