Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency.
In 2022, in a context of sanitary crisis accompanied by a major economic one, equal access to work remains a major objective in France. As several criteria could jeopardize it, one of the possible measures would be to make the applications examined by employers anonymous, so that the selection for job interviews is based solely on qualifications and experience. This measure has never been as popular with the French as it was in 2021, since ** percent of them were in favor of it, compared to five points less in 2015. In March 2022, ** percent of them were in favor of this measure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package for A Study on the Pythonic Functional Constructs' Understandability
This package contains several folders and files with code and data used in the study.
examples/
Contains the code snippets used as objects of the study, named as reported in Table 1, summarizing the experiment design.
RQ1-RQ2-files-for-statistical-analysis/
Contains three .csv files used as input for conducting the statistical analysis and drawing the graphs for addressing the first two research questions of the study. Specifically:
- ConstructUsage.csv contains the declared frequency usage of the three functional constructs object of the study. This file is used to draw Figure 4.
- RQ1.csv contains the collected data used for the mixed-effect logistic regression relating the use of functional constructs with the correctness of the change task, and the logistic regression relating the use of map/reduce/filter functions with the correctness of the change task.
- RQ1Paired-RQ2.csv contains the collected data used for the ordinal logistic regression of the relationship between the perceived ease of understanding of the functional constructs and (i) participants' usage frequency, and (ii) constructs' complexity (except for map/reduce/filter).
inter-rater-RQ3-files/
Contains four .csv files used as input for computing the inter-rater agreement for the manual labeling used for addressing RQ3. Specifically, you will find one file for each functional construct, i.e., comprehension.csv, lambda.csv, and mrf.csv, and a different file used for highlighting the reasons why participants prefer to use the procedural paradigm, i.e., procedural.csv.
Questionnaire-Example.pdf
This file contains the questionnaire submitted to one of the ten experimental groups within our controlled experiment. Other questionnaires are similar, except for the code snippets used for the first section, i.e., change tasks, and the second section, i.e., comparison tasks.
RQ2ManualValidation.csv
This file contains the results of the manual validation being done to sanitize the answers provided by our participants used for addressing RQ2. Specifically, we coded the behavior description using four different levels: (i) correct, (ii) somewhat correct, (iii) wrong, and (iv) automatically generated.
RQ3ManualValidation.xlsx
This file contains the results of the open coding applied to address our third research question. Specifically, you will find four sheets, one for each functional construct and one for the procedural paradigm. For each sheet, you will find the provided answers together with the categories assigned to them.
Appendix.pdf
This file contains the results of the logistic regression relating the use of map, filter, and reduce functions with the correctness of the change task, not shown in the paper.
FuncConstructs-Statistics.r
This file contains an R script that you can reuse to re-run all the analyses conducted and discussed in the paper.
FuncConstructs-Statistics.ipynb
This file contains the code to re-execute all the analysis conducted in the paper as a notebook.
NGL, which stands for 'Not Gonna Lie' is an anonymous Q&A social media app launched at the end of 2021. During the first half of 2022, only after around six months from launch, the app recorded over 12.5 million downloads from users worldwide. NGL allows users to create questions to post on their mainstream social media content for their friends to answer anonymously.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The file consists of 786 014 material access records. Each record consists of the following items: material id(depersonalized), user id (depersonalized), material type name, short material type name, access type, access datetime.Material id and user id are 36-digit alphanumerical identifiers. These identifiers cannot be used to track original data and were created specifically for the purpose of publishing the statistics. When anonymous access is recorded, user id is blank.Material type name (short material type name) can have one of the following values:- "Учебно-методическое пособие" (teaching_aid),- "Учебное пособие" (tutorial),- "Учебник" (textbook),- "Выпускная квалификационная работа" (degree_work),- "Монография" (monograph),- "Студенческая работа" (student_paper),- "Дополнительный материал" (auxiliary_material),- "Диссертация" (dissertation),- "Препринт" (preprint),- "Автореферат" (diss_abstract),- "Патент" (patent),- "Научная статья" (scientific_article),- "Презентация" (presentation).Access type can have one of the following values: - "read" if user accessed material via built-in reader,- "download" if user has downloaded the material,- "get_description" if user has accessed material description page.Access datetime has the following format: YYYY-MM-DD HH:MM:SS.ZZZZZZRecords are delimited with newline characters, record fields - with semicolons (;).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.
https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Dark Web Statistics: The Dark Web refers to the encrypted portion of the internet that is not indexed by traditional search engines.
It exists as a hidden network that can only be accessed through specific software, configurations, and authorization protocols.
The primary technology used to access the Dark Web is the Tor network, which allows users to maintain anonymity and privacy while accessing websites and services.
https://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy
Tor Statistics: The most popular browser is Tor. Data is a priceless resource, and people go to great lengths to get it. The term “onion router†refers to a tool used globally to guarantee anonymity on the net by the use of an onion routing protocol. The use of Tor rose substantially in 2024, attracting attention from several sectors, such as businesses, agencies, governments, and ordinary citizens. With internet privacy being a hot topic in the world today, it is no wonder that we have seen the emergence of Tor as a means for safe browsing and secure communication online.
Tor remains one of the best-known programs for clandestine web exploration. Unlike usual web browsers, this type of software does not allow access to data by any third parties who might want to track someone’s online activity. However, there is more to this program than just its reputation for accessing deep web resources associated with pornography or drugs. A detailed, Eye-opening Tor statistics analysis will be presented here regarding current trends based on user demographics, financial impacts, and future projections.
The UK Innovation Survey (UKIS) provides the main source of information on business innovation in the UK. The survey data is a major resource for research into the nature and functioning of the innovation system and for policy formation. It is used widely across government, regions and by the research community. The UKIS also represents the UK's contribution to the Europe-wide Community Innovation Survey (CIS). Like many innovation surveys across Europe, the UKIS follows general guidelines set out in the Organisation for Economic Co-operation and Development (OECD) publication known as the Oslo Manual (OECD 2005). This manual provides guidelines on the conduct of innovation surveys, including statistical procedures and a review of the range of concepts that fall together under the umbrella term "innovation".
Geographical references: postcodes
The postcodes included in the first edition of these data (i.e. data files prior to 2008-2010) are pseudo-anonymised postcodes. The real postcodes were not available due to the potential risk of identification of the observations. However, these replacement postcodes retain the inherent nested characteristics of real postcodes. In the dataset, the variable of the replacement postcode is 'new_PC'.
The first two editions only include the first half of an observation's anonymised (or real) postcode (sometimes referred to as the outward code). Researchers who are interested in analysing data by more disaggregated geographies (e.g. ward, output area) are advised that this is not possible using the first half of the postcode. Full, real postcodes are available from the third edition onwards, with the exception of .UKIS12, for which only the first half of the postcodes (outward codes) are available.
For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.
Linking to other business studies
These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.
Latest edition information
For the ninth edition (September 2024) data and documentation for UKIS 2023 (also known as UKIS 13), covering the period 2020 to 2022, were added to the study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
METHODS
Topic determination
The project was developed as a team science exercise during a course on Nutrient Biology (New Mexico Institute of Mining and Technology, New Mexico, USA; BIOL 4089/5089). Students were all women pursuing degrees in Biology and Earth Science, with extensive internet search acumen developed from coursework and personal experience. We (students and professor) devoted ~5 hours to discussing women’s health topics prior to searching, defining search criteria, and developing a scoring system. These discussions led to a list of 12, non-cancer health topics particular to women’s health associated with human cis-gender female biology. Considerations of transgender health were discussed, with the consensus decision that those issues are scientifically relevant but deserving of a separate analysis not included here.
Search protocol
After agreeing on search terms, we experimented with settings in the Advanced Search feature in Google (www.google.com), and collectively agreed to the following settings: Language (English); search terms appearing in the “text” of the page; ANY of the terms “woman”, “women” ,“female”; ALL terms when using a single topic from list above with the addition of the word “nutrient”. Figure 1 shows a screenshot for how a search was conducted for endometriosis as an example. To standardize data collection among investigators, all results from the first 5 pages of results were collected. Search result URLs were followed, where a suite of data were gathered (variables in Table 2) and entered into a shared database (Appendix 1). Definitions for each variable (Table 2) were articulated following a 1-week trial period and further group discussion. Variables were defined to minimize subjectivity across investigators, clarify the reporting of results, and standardize data collection.
Scoring metric
The scoring metric was developed to allow for mean and variation (standard deviation, SD; standard error, SE) to be calculated from each topic, and compare among topics, and answer how much variation in quality is likely to be encountered across categories of women’s health issues. We report both variation metrics as SD encompasses the variation of the data set, while SE scales for sample size variation among categorical variables. When searching topics using the same criteria:
Are some topics more likely to result in results for pages with scientifically verifiable information?
Does the variation of quality vary between topics?
Peer-reviewed journal articles were included in the database if encountered in the searches but were removed before statistical analysis. The justification for removing those sources was that it is possible the Google algorithm included those sources disproportionately for our group of college students and a professor who regularly searches for academic articles. We also assume those sources are consulted less frequently by lay audiences searching for health information.
Scores were based on six binary (presence/absence) attributes of each web page evaluated. These were: Author (name present/absent), author credentials given, reviewer, reviewer credentials, sources listed, peer-reviewed sources listed. A score of 1 was given if the attribute was present, and 0 if absent. The total number of references cited on a webpage, as well as the number of those that were peer-reviewed (Table 2) were recorded, but for scoring purposes, a 1 or 0 was assigned if there were or were not references and peer-reviewed references, respectively. Potential scores thus ranged from 0 to 6.
We performed a simple validation experiment via anonymous surveys sent to students at our institution (New Mexico Tech), a predominantly STEM-focused public university. Using the final scores from the search result webpages, a single website from each score was selected at random using the RAND() function in Microsoft Excel to assign a random variable as an identifier to each URL, then sorting by that variable and selecting the first article in a given score category. Webpages with scores of 0 or 6 were excluded from the validation experiment. Following institutional review, a survey was sent to the “all student” email list, and recipients were directed to a web survey that asked participants to give a score of 1-5 to each of the 5 random (but previously scored) web pages, without repeating a score. Participants were given minimal information about the project and had no indication the pages had already been assigned scores. Survey results were collected anonymously by having responses routed to a spreadsheet, and no personally identifiable data were collected from participants.
Statistical analysis
Differences in mean scores within each health topic and the mean number of sources per evaluated webpage were evaluated by calculating Bayes Factors; response variables (mean score, number of sources) for each topic were compared to a null model of no difference across topics (y ~ category + error). Equal prior weight was given to each potential model. Variance inequality was tested via Levene’s test, and normality was assessed using quartile-quartile plots. Correlation analysis was used to test the strength of the association between individual scores per website and the number of sources cited per website. Because only the presence or absence of sources was considered in the score calculation, the number of sources is independent of score, and justifies correlation analysis. Statistical analyses were conducted in the open-source software package JASP version 0.19.2 (JASP, 2024).
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Three survey tools used to conduct surveys in the Amazon during 2018: (1) Household survey with direct questions only. This questionnaire includes questions using direct questioning methods (e.g. did you consume wild meat in your house in 2018?). It contains questions about household food consumption, meat preference, and socioeconomic data. Data collected with this questionnaire were also used as part of a randomized response technique (Unrelated Question Design; Greenberg et al. 1971; Blair et al. 2015). Surveys were conducted with heads of households (men or women). (2) Household survey with indirect and direct questions. This questionnaire includes questions using both indirect questioning methods (i.e. randomized response technique: Unrelated Question Design) and direct questioning methods. It contains questions about meat consumption, meat preference, and socioeconomic data. Surveys were conducted with heads of households (men or women). (3) Anonymous survey of school children. This questionnaire included both indirect questioning methods (non-randomized response technique: Triangular Model; Yu et al. 2008). It contains questions about meat consumption, meat preference, and socioeconomic data. Surveys were conducted with children between 12 and 18 years old.
References: Blair G, Imai K, Zhou Y-Y. 2015. Design and Analysis of the Randomized Response Technique. Journal of the American Statistical Association 110:1304-1319. Greenberg BG, Kuebler RR, Abernathy JR, Horvitz DG. 1971. Application of the Randomized Response Technique in Obtaining Quantitative Data. Journal of the American Statistical Association 66:243-250 Yu J-W, Tian G-L, Tang M-L. 2008. Two new models for survey sampling with sensitive characteristic: design and analysis. Metrika 67, 251–263.
Any questions, please contact Willandia Chaves at wchaves@vt.edu.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the supplementary materials (Supplementary_figures.docx, Supplementary_tables.docx) of the manuscript: "Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France". This repository also provides the R codes and datasets necessary to run the analyses described in the manuscript.
The R datasets with suffix "_a" have anonymous spatial coordinates to respect confidentiality. Therefore, the preliminary preparation of the data is not provided in the public codes. These datasets, all geolocated and necessary to the analyses, are:
Attack_sf_a.RData: 19,302 analyzed wolf attacks on sheep
ID: unique ID of the attack
DATE: date of the attack
PASTURE: the related pasture ID from "Pasture_sf_a" where the attack is located
STATUS: column resulting from the preparation and the attribution of attacks to pastures (part 2.2.4 of the manuscript); not shown here to respect confidentiality
Pasture_sf_a.RData: 4987 analyzed pastures grazed by sheep
ID: unique ID of the pasture
CODE: Official code in the pastoral census
FLOCK_SIZE: maximum annual number of sheep grazing in the pasture
USED_MONTHS: months for which the pasture is grazed by sheep
Removal_sf_a.RData: 232 analyzed single wolf removal or groups of wolf removals
ID: unique ID of the removal
OVERLAP: are they single removal ("non-interacting" in the manuscript => "NO" here), or not ("interacting" in the manuscrit, here "SIMULTANEOUS" for removals occurring during the same operation or "NON-SIMULTANEOUS" if not).
DATE_MIN: date of the single removal or date of the first removal of a group
DATE_MAX: date of the single removal or date of the last removal of a group
CLASS: administrative type of the removal according to definitions from 2.1 part of the manuscript
SEX: sex or sexes of the removed wolves if known
AGE: class age of the removed wolves if known
BREEDER: breeding status of the removed female wolves, "Yes" for female breeder, "No" for female non-breeder. Males are "No" by default, when necropsied; dead individuals with NA were not found.
SEASON: season of the removal, as defined in part 2.3.4 of the manuscript
MASSIF: mountain range attributed to the removal, as defined in part 2.3.4 of the manuscript
Area_to_exclude_sf_a.RData: one row for each mountain range, corresponding to the area where removal controls of the mountain range could not be sampled, as defined in part 2.3.6 of the manuscript
These datasets were used to run the following analyses codes:
Code 1 : The file Kernel_wolf_culling_attacks_p.R contains the before-after analyses.
We start by delimiting the spatio-temporal buffer for each row of the "Removal_sf_a.RData" dataset.
We identify the attacks from "Attack_sf_a.RData" within each buffer, giving the data frame "Buffer_df" (one row per attack)
We select the pastures from "Pasture_sf_a.RData" within each buffer, giving the data frame "Buffer_sf" (one row per removal)
We calculate the spatial correction
We spatially slice each buffer into 200 rings, giving the data frame "Ring_sf" (one row per ring)
We add the total pastoral area of the ring of the attack ("SPATIAL_WEIGHT"), for each attack of each buffer, within Buffer_df ("Buffer_df.RData")
We calculate the pastoral correction
We create the pastoral matrix for each removal, giving a matrix of 200 rows (one for each ring) and 180 columns (one for each day, 90 days before the removal date and 90 day after the removal date), with the total pastoral area in use by sheep for each corresponding cell of the matrix (one element per removal, "Pastoral_matrix_lt.RData")
We simulate, for each removal, the random distribution of the attacks from "Buffer_df.RData" according to "Pastoral_matrix_lt.RData". The process is done 100 times (one element per simulation, "Buffer_simulation_lt.RData").
We estimate the attack intensities
We classified the removals into 20 subsets, according to part 2.3.4 of the manuscript ("Variables_lt.RData") (one element per subset)
We perform, for each subset, the kernel estimations with the observed attacks ("Kernel_lt.RData"), with the simulated attacks ("Kernel_simulation_lt.RData") and we correct the first kernel computations with the second ("Kernel_controlled_lt.RData") (one element per subset).
We calculate the trend of attack intensities, for each subset, that compares the total attack intensity before and after the removals (part 2.3.5 of the manuscript), giving "Trends_intensities_df.RData". (one row per subset)
We calculate the trend of attack intensities, for each subset, along the spatial axis, three times, one for each time analysis scale. This gives "Shift_df" (one row per ring and per time analysis scale.
Code 2 : The file Control_removals_p.R contains the control-impact analyses.
It starts with the simulation of 100 removal control sets ("Control_sf_lt_a.RData") from the real set of removals ("Removal_sf_a.RData"), that is done with the function "Control_fn" (l. 92).
The rest of the analyses follows the same process as in the first code "Kernel_wolf_culling_attacks_p.R", in order to apply the before-after analyses to each control set. All objects have the same structure as before, except that they are now a list, with one resulting element per control set. These objects have "control" in their names (not to be confused with "controlled" which refers to the pastoral correction already applied in the first code).
The code is also applied again, from l. 92 to l. 433, this time for the real set of removals (l. 121) - with "Simulated = FALSE" (l. 119). We could not simply use the results from the first code because the set of removals is restricted to removals attributed to mountain ranges only. There are 2 resulting objects: "Kernel_real_lt.RData" (observed real trends) and "Kernel_controlled_real_lt.RData" (real trends corrected for pastoral use).
The part of the code from line 439 to 524 relates to the calculations of the trends (for the real set and the control sets), as in the first code, giving "Trends_intensities_real_df.RData" and "Trends_intensities_control_lt.RData".
The part of the code from line 530 to 588 relates to the calculation of the 95% confidence intervals and the means of the intensity trends for each subset based on the results of the 100 control sets (Trends_intensities_mean_control_df.RData, Trends_intensities_CImin_control_df.RData and Trends_intensities_CImax_control_df.RData). This will be used to test the significativity of the real trends. This comparison is done right after, l. 595-627, and gives the data frame "Trends_comparison_df.RData".
Code 3 : The file Figures.R produces part of the figures from the manuscript:
"Dataset map": figure 1
"Buffer": figure 2 (then pasted in powerpoint)
"Kernel construction": figure 5 (then pasted in powerpoint)
"Trend distributions": figure 7
"Kernels": part of figures 10 and S2
"Attack shifts": figure 9 and S1
"Significant": figure 8
The English Business Survey (EBS) is commissioned by the Department for Business, Innovation and Skills (BIS) to provide a monthly assessment of business perceptions of current, past and expected economic and business conditions in each English region. A detailed understanding of businesses' perceptions and plans across England will inform the Government's economic growth and rebalancing agenda.
The sample for the EBS is drawn from the Inter-departmental Business Register (IDBR). The primary objective is to achieve a sample that is as close as possible to being proportionate to the employment distribution within England. The EBS is conducted at the level of the workplace (i.e. individual sites within an enterprise, such as a factory, shop or office) rather than at the level of the business or enterprise. The sample is therefore selected at this level as well. The sample of workplaces is selected from across all industry sectors, including public sector and not-for-profit organisations.
Further information on the EBS and monthly statistical releases derived from the survey can be found on the BIS English Business Survey website.
Linking to other business studies
These data contain IDBR reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.
The majority of observations only contain IDBR plant identifiers (LUref). Therefore, it may not be possible to ascertain plants that belong to the same enterprise. Therefore, we recommend that users also apply for access to the Business Structure Database (SN 6697), which will allow users to link observations to their parent enterprises.
In France, farmers commission about 250,000 soil-testing analyses per year to assist them managing soil fertility. The number and diversity of origin of the samples make these analyses an interesting and original information source regarding cultivated topsoil variability. Moreover, these analyses relate to several parameters strongly influenced by human activity (macronutrient contents, pH...), for which existing cartographic information is not very relevant. Compiling the results of these analyses into a database makes it possible to re-use these data within both a national and temporal framework. A database compilation relating to data collected over the period 1990-2009 has been recently achieved. So far, commercial soil-testing laboratories approved by the Ministry of Agriculture have provided analytical results from more than 2,000,000 samples. After the initial quality control stage, analytical results from more than 1,900,000 samples were available in the database. The anonymity of the landholders seeking soil analyses is perfectly preserved, as the only identifying information stored is the location of the nearest administrative city to the sample site. We present in this dataset a set of statistical parameters of the spatial distributions for several agronomic soil properties. These statistical parameters are calculated for 4 different nested spatial entities (administrative areas: e.g. regions, departments, counties and agricultural areas) and for 4 time periods (1990-1994, 1995-1999, 2000-2004, 2005-2009). Two kinds of agronomic soil properties are available: the firs one correspond to the quantitative variables like the organic carbon content and the second one corresponds to the qualitative variables like the texture class. For each spatial unit and temporal period, we calculated the following statistics stets: the first set is calculated for the quantitative variables and corresponds to the number of samples, the mean, the standard deviation and, the 2-,4-,10-quantiles; the second set is calculated for the qualitative variables and corresponds to the number of samples, the value of the dominant class, the number of samples of the dominant class, the second dominant class, the number of samples of the second dominant class.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This publication provides the timeliest picture available of people using NHS funded secondary mental health, learning disabilities and autism services in England, excluding those who are solely in contact with Talking Therapies. This information will be of use to people needing access to information quickly for operational decision making and other purposes. More detailed information on the quality and completeness of these statistics is available in the Data Quality section, as well as within the Data Coverage and Data Quality VODIM and Integrity files available under 'Resources'. Some amendments to methodologies have been made in this publication. Previously, in some metrics, data for Kooth Digital Health Limited was handled in a different way due to them providing online anonymous services. This methodology has been extended to a second provider, MeeToo Education. Additionally, there have been some amendments to the referral spells methodology (metrics starting MRS). These include extending the list of teams in the exclusion and inclusion lists to include depreciated codes and introducing the use of the referral rejection and closure dates where the service discharge date is not available.
According to a 2023 survey of parents in the United Kingdom, around 30 percent of parents reported restricting access to inappropriate online content using platforms' safety modes. Approximately the same number of respondents reported using parental control built into the device by the manufacturer. UK parents vs. apps As an estimated 96 percent of children's apps available to UK users transmit GPS location or IP address to advertisers and/or data brokers, parents have reason to concern themselves with monitoring their kids' mobile usage. According to research conducted at the beginning of 2023, 70 percent of apps in the Apple App Store and the Google Play Store collected persistent identifiers. Additionally, 15 percent of TikTok users aged between 13 and 17 years had experienced anonymous trolling in the past month, as well as being exposed to sexualized images on the popular short-video platform, according to a survey conducted in the United Kingdom in 2022. Not just monitoring: apps for parents cover several needs In the first half of 2022, there were around 900 apps available to UK children, but only approximately 300 apps designed for parents. Approximately 78 of these were apps to help babies sleep, 72 were apps designed to track babies' rhythms, and only 20 apps available to UK parents were designed to keep and edit babies' photos. Despite their limited availability, baby photo apps for parents generated 674 thousand downloads in the first half of 2022, while fertility tracking apps generated five million downloads among UK users in the same period.
The Annual Survey of Hours and Earnings (ASHE) is one of the largest surveys of the earnings of individuals in the UK. Data on the wages, paid hours of work, and pensions arrangements of nearly one per cent of the working population are collected. Other variables relating to age, occupation and industrial classification are also available. The ASHE sample is drawn from National Insurance records for working individuals, and the survey forms are sent to their respective employers to complete.
While limited in terms of personal characteristics compared to surveys such as the Labour Force Survey, the ASHE is useful not only because of its larger sample size, but also the responses regarding wages and hours are considered to be more accurate, since the responses are provided by employers rather than from employees themselves. A further advantage of the ASHE is that data for the same individuals are collected year after year. It is therefore possible to construct a panel dataset of responses for each individual running back as far as 1997, and to track how occupations, earnings and working hours change for individuals over time. Furthermore, using the unique business identifiers, it is possible to combine ASHE data with data from other business surveys, such as the Annual Business Survey (UK Data Archive SN 7451).
The ASHE replaced the New Earnings Survey (NES, SN 6704) in 2004. NES was developed in the 1970s in response to the policy needs of the time. The survey had changed very little in its thirty-year history. ASHE datasets for the years 1997-2003 were derived using ASHE methodologies applied to NES data.
The ASHE improves on the NES in the following ways:
For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.
Latest Edition Information
For the twenty-sixth edition (February 2025), the data file 'ashegb_2023r_2024p_pc' has been added, along with the accompanying data dictionary.
According to a study conducted in New Zealand in December 2020, the United States had the highest number of 4chan posts per 100 thousand internet users. Overall, North Americans made 2,810 posts per 100 thousand internet users on the anonymous English language based website, and Canadians made 2,728 per 100 thousand users. Additionally, New Zealand and the United Kingdom saw just over 1,500 4chan posts per 100 thousand internet users.
The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).
The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.
The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.
Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).
A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.
National Coverage
The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.
SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.
It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.
The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).
Sample survey data [ssd]
-Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)
WILL CONFIRM LATER!!
OSO LE MEA E LE FAASA...AEA :-)
Mail Questionnaire [mail]
Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.
Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.
Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.
NOT APPLICABLE!!
The Annual Respondents Database (ARD) is constructed from a compulsory business survey. Until 1997 it was created out of the Annual Censuses of Production and Construction (ACOP and ACOC); these were combined into the Annual Business Inquiry (ABI) in 1998. The ARD is a census of large businesses, and a sample of smaller ones. Smaller firms may receive a "short form". These do not require detailed breakdowns of totals. Hence for certain variables the values may be imputed from third party sources or estimated rather than returned by respondents.
This dataset is created for the Economic Analysis and Satellite Accounts Division for research purposes. To create the ARD, the other surveys are converted into a single consistent format linked by the Inter-Departmental Business Register references over time. Northern Ireland data is held up to 2001. From 2002, the ABI is collected and stored separately in Northern Ireland. Special permission is required to use new NI ABI data.
ABI background
The ABI is the financial information survey conducted by the Office for National Statistics (ONS). This is a statutory survey conducted under the Statistics of Trade Act 1947. Organisations are obliged under this legislation to provide a response. Businesses are sampled from the ONS business register current at the time of drawing the sample: first the CSO Business Register, which ran until 1993; then the Inter-Departmental Business Register, which has run from 1994 onwards. The ONS holds firms' responses to the ABI in the Annual Respondents Database (ARD).
The ABI replaced the following annual survey systems in 1998:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency.