25 datasets found

f
Data_Sheet_1_Federated statistical analysis: non-parametric testing and...
frontiersin.figshare.com
pdf
Updated Nov 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ori Becher; Mira Marcus-Kalish; David M. Steinberg (2023). Data_Sheet_1_Federated statistical analysis: non-parametric testing and quantile estimation.pdf [Dataset]. http://doi.org/10.3389/fams.2023.1267034.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fams.2023.1267034.s001
Dataset updated
Nov 13, 2023
Dataset provided by
Frontiers
Authors
Ori Becher; Mira Marcus-Kalish; David M. Steinberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency.
Share of people supporting the anonymization of applications in France...
statista.com
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of people supporting the anonymization of applications in France 2015-2022 [Dataset]. https://www.statista.com/statistics/1233206/share-people-supporting-anonymization-applications/
Explore at:
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 16, 2022 - Mar 17, 2022
Area covered
France
Description
In 2022, in a context of sanitary crisis accompanied by a major economic one, equal access to work remains a major objective in France. As several criteria could jeopardize it, one of the possible measures would be to make the applications examined by employers anonymous, so that the selection for job interviews is based solely on qualifications and experience. This measure has never been as popular with the French as it was in 2021, since ** percent of them were in favor of it, compared to five points less in 2015. In March 2022, ** percent of them were in favor of this measure.
Data from: Replication package for the paper: "A Study on the Pythonic...
zenodo.org
zip
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2023). Replication package for the paper: "A Study on the Pythonic Functional Constructs' Understandability" [Dataset]. http://doi.org/10.5281/zenodo.10101383
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10101383
Dataset updated
Nov 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package for A Study on the Pythonic Functional Constructs' Understandability
This package contains several folders and files with code and data used in the study.

examples/
Contains the code snippets used as objects of the study, named as reported in Table 1, summarizing the experiment design.
RQ1-RQ2-files-for-statistical-analysis/
Contains three .csv files used as input for conducting the statistical analysis and drawing the graphs for addressing the first two research questions of the study. Specifically:
- ConstructUsage.csv contains the declared frequency usage of the three functional constructs object of the study. This file is used to draw Figure 4.
- RQ1.csv contains the collected data used for the mixed-effect logistic regression relating the use of functional constructs with the correctness of the change task, and the logistic regression relating the use of map/reduce/filter functions with the correctness of the change task.
- RQ1Paired-RQ2.csv contains the collected data used for the ordinal logistic regression of the relationship between the perceived ease of understanding of the functional constructs and (i) participants' usage frequency, and (ii) constructs' complexity (except for map/reduce/filter).
inter-rater-RQ3-files/
Contains four .csv files used as input for computing the inter-rater agreement for the manual labeling used for addressing RQ3. Specifically, you will find one file for each functional construct, i.e., comprehension.csv, lambda.csv, and mrf.csv, and a different file used for highlighting the reasons why participants prefer to use the procedural paradigm, i.e., procedural.csv.
Questionnaire-Example.pdf
This file contains the questionnaire submitted to one of the ten experimental groups within our controlled experiment. Other questionnaires are similar, except for the code snippets used for the first section, i.e., change tasks, and the second section, i.e., comparison tasks.
RQ2ManualValidation.csv
This file contains the results of the manual validation being done to sanitize the answers provided by our participants used for addressing RQ2. Specifically, we coded the behavior description using four different levels: (i) correct, (ii) somewhat correct, (iii) wrong, and (iv) automatically generated.
RQ3ManualValidation.xlsx
This file contains the results of the open coding applied to address our third research question. Specifically, you will find four sheets, one for each functional construct and one for the procedural paradigm. For each sheet, you will find the provided answers together with the categories assigned to them.
Appendix.pdf
This file contains the results of the logistic regression relating the use of map, filter, and reduce functions with the correctness of the change task, not shown in the paper.
FuncConstructs-Statistics.r
This file contains an R script that you can reuse to re-run all the analyses conducted and discussed in the paper.
FuncConstructs-Statistics.ipynb
This file contains the code to re-execute all the analysis conducted in the paper as a notebook.
NGL anonymous Q&A app global downloads 2022
statista.com
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). NGL anonymous Q&A app global downloads 2022 [Dataset]. https://www.statista.com/statistics/1323275/ngl-anonymous-social-app-downloads/
Explore at:
Dataset updated
Mar 4, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
NGL, which stands for 'Not Gonna Lie' is an anonymous Q&A social media app launched at the end of 2021. During the first half of 2022, only after around six months from launch, the app recorded over 12.5 million downloads from users worldwide. NGL allows users to create questions to post on their mainstream social media content for their friends to answer anonymously.
Statistics of Access to Materials Posted in the Digital Repository of the...
figshare.com
png
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Bogomolov (2023). Statistics of Access to Materials Posted in the Digital Repository of the Southern Federal University (https://hub.sfedu.ru/repository/) [Dataset]. http://doi.org/10.6084/m9.figshare.12129066.v3
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12129066.v3
Dataset updated
Jun 5, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Alexander Bogomolov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The file consists of 786 014 material access records. Each record consists of the following items: material id(depersonalized), user id (depersonalized), material type name, short material type name, access type, access datetime.Material id and user id are 36-digit alphanumerical identifiers. These identifiers cannot be used to track original data and were created specifically for the purpose of publishing the statistics. When anonymous access is recorded, user id is blank.Material type name (short material type name) can have one of the following values:- "Учебно-методическое пособие" (teaching_aid),- "Учебное пособие" (tutorial),- "Учебник" (textbook),- "Выпускная квалификационная работа" (degree_work),- "Монография" (monograph),- "Студенческая работа" (student_paper),- "Дополнительный материал" (auxiliary_material),- "Диссертация" (dissertation),- "Препринт" (preprint),- "Автореферат" (diss_abstract),- "Патент" (patent),- "Научная статья" (scientific_article),- "Презентация" (presentation).Access type can have one of the following values: - "read" if user accessed material via built-in reader,- "download" if user has downloaded the material,- "get_description" if user has accessed material description page.Access datetime has the following format: YYYY-MM-DD HH:MM:SS.ZZZZZZRecords are delimited with newline characters, record fields - with semicolons (;).
f
Rmd code logistic federated.
plos.figshare.com
txt
Updated Nov 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau (2024). Rmd code logistic federated. [Dataset]. http://doi.org/10.1371/journal.pone.0312697.s010
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312697.s010
Dataset updated
Nov 14, 2024
Dataset provided by
PLOS ONE
Authors
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.
M
Dark Web Statistics 2025 By Security, Network, Privacy
scoop.market.us
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market.us Scoop (2025). Dark Web Statistics 2025 By Security, Network, Privacy [Dataset]. https://scoop.market.us/dark-web-statistics/
Explore at:
Dataset updated
Jan 14, 2025
Dataset authored and provided by
Market.us Scoop
License
https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

Dark Web Statistics: The Dark Web refers to the encrypted portion of the internet that is not indexed by traditional search engines.

It exists as a hidden network that can only be accessed through specific software, configurations, and authorization protocols.

The primary technology used to access the Dark Web is the Tor network, which allows users to maintain anonymity and privacy while accessing websites and services.
https://scoop.market.us/wp-content/uploads/2023/07/Dark-Web-Statistics.png" alt="Dark Web Statistics" class="wp-image-36850">
S
Eye-Opening Tor Statistics And Facts (2025)
sci-tech-today.com
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sci-Tech Today (2025). Eye-Opening Tor Statistics And Facts (2025) [Dataset]. https://www.sci-tech-today.com/stats/tor-statistics/
Explore at:
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Sci-Tech Today
License
https://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

Tor Statistics: The most popular browser is Tor. Data is a priceless resource, and people go to great lengths to get it. The term â€œonion routerâ€ refers to a tool used globally to guarantee anonymity on the net by the use of an onion routing protocol. The use of Tor rose substantially in 2024, attracting attention from several sectors, such as businesses, agencies, governments, and ordinary citizens. With internet privacy being a hot topic in the world today, it is no wonder that we have seen the emergence of Tor as a means for safe browsing and secure communication online.

Tor remains one of the best-known programs for clandestine web exploration. Unlike usual web browsers, this type of software does not allow access to data by any third parties who might want to track someoneâ€™s online activity. However, there is more to this program than just its reputation for accessing deep web resources associated with pornography or drugs. A detailed, Eye-opening Tor statistics analysis will be presented here regarding current trends based on user demographics, financial impacts, and future projections.
UK Innovation Survey, 1994-2023: Secure Access
beta.ukdataservice.ac.uk
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trade Northern Ireland. Department Of Enterprise (2025). UK Innovation Survey, 1994-2023: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-6699-9
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-6699-9
Dataset updated
2025
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Authors
Trade Northern Ireland. Department Of Enterprise
Area covered
United Kingdom
Description
The UK Innovation Survey (UKIS) provides the main source of information on business innovation in the UK. The survey data is a major resource for research into the nature and functioning of the innovation system and for policy formation. It is used widely across government, regions and by the research community. The UKIS also represents the UK's contribution to the Europe-wide Community Innovation Survey (CIS). Like many innovation surveys across Europe, the UKIS follows general guidelines set out in the Organisation for Economic Co-operation and Development (OECD) publication known as the Oslo Manual (OECD 2005). This manual provides guidelines on the conduct of innovation surveys, including statistical procedures and a review of the range of concepts that fall together under the umbrella term "innovation".

Geographical references: postcodes
The postcodes included in the first edition of these data (i.e. data files prior to 2008-2010) are pseudo-anonymised postcodes. The real postcodes were not available due to the potential risk of identification of the observations. However, these replacement postcodes retain the inherent nested characteristics of real postcodes. In the dataset, the variable of the replacement postcode is 'new_PC'.

The first two editions only include the first half of an observation's anonymised (or real) postcode (sometimes referred to as the outward code). Researchers who are interested in analysing data by more disaggregated geographies (e.g. ward, output area) are advised that this is not possible using the first half of the postcode. Full, real postcodes are available from the third edition onwards, with the exception of .UKIS12, for which only the first half of the postcodes (outward codes) are available.

For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.

Linking to other business studies
These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.

Latest edition information
For the ninth edition (September 2024) data and documentation for UKIS 2023 (also known as UKIS 13), covering the period 2020 to 2022, were added to the study.
Data from: Variation in quality of women's health topic information from...
zenodo.org
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Duval; Benjamin Duval (2025). Variation in quality of women's health topic information from systematic internet searches [Dataset]. http://doi.org/10.5281/zenodo.15839790
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15839790
Dataset updated
Jul 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Duval; Benjamin Duval
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
METHODS

Topic determination

The project was developed as a team science exercise during a course on Nutrient Biology (New Mexico Institute of Mining and Technology, New Mexico, USA; BIOL 4089/5089). Students were all women pursuing degrees in Biology and Earth Science, with extensive internet search acumen developed from coursework and personal experience. We (students and professor) devoted ~5 hours to discussing women’s health topics prior to searching, defining search criteria, and developing a scoring system. These discussions led to a list of 12, non-cancer health topics particular to women’s health associated with human cis-gender female biology. Considerations of transgender health were discussed, with the consensus decision that those issues are scientifically relevant but deserving of a separate analysis not included here.

Search protocol

After agreeing on search terms, we experimented with settings in the Advanced Search feature in Google (www.google.com), and collectively agreed to the following settings: Language (English); search terms appearing in the “text” of the page; ANY of the terms “woman”, “women” ,“female”; ALL terms when using a single topic from list above with the addition of the word “nutrient”. Figure 1 shows a screenshot for how a search was conducted for endometriosis as an example. To standardize data collection among investigators, all results from the first 5 pages of results were collected. Search result URLs were followed, where a suite of data were gathered (variables in Table 2) and entered into a shared database (Appendix 1). Definitions for each variable (Table 2) were articulated following a 1-week trial period and further group discussion. Variables were defined to minimize subjectivity across investigators, clarify the reporting of results, and standardize data collection.

Scoring metric

The scoring metric was developed to allow for mean and variation (standard deviation, SD; standard error, SE) to be calculated from each topic, and compare among topics, and answer how much variation in quality is likely to be encountered across categories of women’s health issues. We report both variation metrics as SD encompasses the variation of the data set, while SE scales for sample size variation among categorical variables. When searching topics using the same criteria:

Are some topics more likely to result in results for pages with scientifically verifiable information?

Does the variation of quality vary between topics?

Peer-reviewed journal articles were included in the database if encountered in the searches but were removed before statistical analysis. The justification for removing those sources was that it is possible the Google algorithm included those sources disproportionately for our group of college students and a professor who regularly searches for academic articles. We also assume those sources are consulted less frequently by lay audiences searching for health information.

Scores were based on six binary (presence/absence) attributes of each web page evaluated. These were: Author (name present/absent), author credentials given, reviewer, reviewer credentials, sources listed, peer-reviewed sources listed. A score of 1 was given if the attribute was present, and 0 if absent. The total number of references cited on a webpage, as well as the number of those that were peer-reviewed (Table 2) were recorded, but for scoring purposes, a 1 or 0 was assigned if there were or were not references and peer-reviewed references, respectively. Potential scores thus ranged from 0 to 6.

We performed a simple validation experiment via anonymous surveys sent to students at our institution (New Mexico Tech), a predominantly STEM-focused public university. Using the final scores from the search result webpages, a single website from each score was selected at random using the RAND() function in Microsoft Excel to assign a random variable as an identifier to each URL, then sorting by that variable and selecting the first article in a given score category. Webpages with scores of 0 or 6 were excluded from the validation experiment. Following institutional review, a survey was sent to the “all student” email list, and recipients were directed to a web survey that asked participants to give a score of 1-5 to each of the 5 random (but previously scored) web pages, without repeating a score. Participants were given minimal information about the project and had no indication the pages had already been assigned scores. Survey results were collected anonymously by having responses routed to a spreadsheet, and no personally identifiable data were collected from participants.

Statistical analysis

Differences in mean scores within each health topic and the mean number of sources per evaluated webpage were evaluated by calculating Bayes Factors; response variables (mean score, number of sources) for each topic were compared to a null model of no difference across topics (y ~ category + error). Equal prior weight was given to each potential model. Variance inequality was tested via Levene’s test, and normality was assessed using quartile-quartile plots. Correlation analysis was used to test the strength of the association between individual scores per website and the number of sources cited per website. Because only the presence or absence of sources was considered in the score calculation, the number of sources is independent of score, and justifies correlation analysis. Statistical analyses were conducted in the open-source software package JASP version 0.19.2 (JASP, 2024).
v
Survey tools for research conducted in the Amazon
data.lib.vt.edu
docx
Updated May 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willandia Chaves; David Wilcove; Aline Tavares; Denis Valle; Thais Morcatty (2021). Survey tools for research conducted in the Amazon [Dataset]. http://doi.org/10.7294/jk6j-2q18
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.7294/jk6j-2q18
Dataset updated
May 18, 2021
Dataset provided by
University Libraries, Virginia Tech
Authors
Willandia Chaves; David Wilcove; Aline Tavares; Denis Valle; Thais Morcatty
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered
Amazon Rainforest
Description
Three survey tools used to conduct surveys in the Amazon during 2018: (1) Household survey with direct questions only. This questionnaire includes questions using direct questioning methods (e.g. did you consume wild meat in your house in 2018?). It contains questions about household food consumption, meat preference, and socioeconomic data. Data collected with this questionnaire were also used as part of a randomized response technique (Unrelated Question Design; Greenberg et al. 1971; Blair et al. 2015). Surveys were conducted with heads of households (men or women). (2) Household survey with indirect and direct questions. This questionnaire includes questions using both indirect questioning methods (i.e. randomized response technique: Unrelated Question Design) and direct questioning methods. It contains questions about meat consumption, meat preference, and socioeconomic data. Surveys were conducted with heads of households (men or women). (3) Anonymous survey of school children. This questionnaire included both indirect questioning methods (non-randomized response technique: Triangular Model; Yu et al. 2008). It contains questions about meat consumption, meat preference, and socioeconomic data. Surveys were conducted with children between 12 and 18 years old.

References: Blair G, Imai K, Zhou Y-Y. 2015. Design and Analysis of the Randomized Response Technique. Journal of the American Statistical Association 110:1304-1319. Greenberg BG, Kuebler RR, Abernathy JR, Horvitz DG. 1971. Application of the Randomized Response Technique in Obtaining Quantitative Data. Journal of the American Statistical Association 66:243-250 Yu J-W, Tian G-L, Tang M-L. 2008. Two new models for survey sampling with sensitive characteristic: design and analysis. Metrika 67, 251–263.

Any questions, please contact Willandia Chaves at wchaves@vt.edu.
Z
Data from: Spatio-temporal dynamics of attacks around deaths of wolves: A...
data.niaid.nih.gov
zenodo.org
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chamaillé-Jammes, Simon (2025). Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12772867
Explore at:
Dataset updated
Feb 19, 2025
Dataset provided by
Gimenez, Olivier
Grente, Oksana
Chamaillé-Jammes, Simon
Duchamp, Christophe
Opitz, Thomas
Drouet-Hoguet, Nolwenn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
France
Description
This repository contains the supplementary materials (Supplementary_figures.docx, Supplementary_tables.docx) of the manuscript: "Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France". This repository also provides the R codes and datasets necessary to run the analyses described in the manuscript.

The R datasets with suffix "_a" have anonymous spatial coordinates to respect confidentiality. Therefore, the preliminary preparation of the data is not provided in the public codes. These datasets, all geolocated and necessary to the analyses, are:

Attack_sf_a.RData: 19,302 analyzed wolf attacks on sheep

ID: unique ID of the attack

DATE: date of the attack

PASTURE: the related pasture ID from "Pasture_sf_a" where the attack is located

STATUS: column resulting from the preparation and the attribution of attacks to pastures (part 2.2.4 of the manuscript); not shown here to respect confidentiality

Pasture_sf_a.RData: 4987 analyzed pastures grazed by sheep

ID: unique ID of the pasture

CODE: Official code in the pastoral census

FLOCK_SIZE: maximum annual number of sheep grazing in the pasture

USED_MONTHS: months for which the pasture is grazed by sheep

Removal_sf_a.RData: 232 analyzed single wolf removal or groups of wolf removals

ID: unique ID of the removal

OVERLAP: are they single removal ("non-interacting" in the manuscript => "NO" here), or not ("interacting" in the manuscrit, here "SIMULTANEOUS" for removals occurring during the same operation or "NON-SIMULTANEOUS" if not).

DATE_MIN: date of the single removal or date of the first removal of a group

DATE_MAX: date of the single removal or date of the last removal of a group

CLASS: administrative type of the removal according to definitions from 2.1 part of the manuscript

SEX: sex or sexes of the removed wolves if known

AGE: class age of the removed wolves if known

BREEDER: breeding status of the removed female wolves, "Yes" for female breeder, "No" for female non-breeder. Males are "No" by default, when necropsied; dead individuals with NA were not found.

SEASON: season of the removal, as defined in part 2.3.4 of the manuscript

MASSIF: mountain range attributed to the removal, as defined in part 2.3.4 of the manuscript

Area_to_exclude_sf_a.RData: one row for each mountain range, corresponding to the area where removal controls of the mountain range could not be sampled, as defined in part 2.3.6 of the manuscript

These datasets were used to run the following analyses codes:

Code 1 : The file Kernel_wolf_culling_attacks_p.R contains the before-after analyses.

We start by delimiting the spatio-temporal buffer for each row of the "Removal_sf_a.RData" dataset.

We identify the attacks from "Attack_sf_a.RData" within each buffer, giving the data frame "Buffer_df" (one row per attack)

We select the pastures from "Pasture_sf_a.RData" within each buffer, giving the data frame "Buffer_sf" (one row per removal)

We calculate the spatial correction

We spatially slice each buffer into 200 rings, giving the data frame "Ring_sf" (one row per ring)

We add the total pastoral area of the ring of the attack ("SPATIAL_WEIGHT"), for each attack of each buffer, within Buffer_df ("Buffer_df.RData")

We calculate the pastoral correction

We create the pastoral matrix for each removal, giving a matrix of 200 rows (one for each ring) and 180 columns (one for each day, 90 days before the removal date and 90 day after the removal date), with the total pastoral area in use by sheep for each corresponding cell of the matrix (one element per removal, "Pastoral_matrix_lt.RData")

We simulate, for each removal, the random distribution of the attacks from "Buffer_df.RData" according to "Pastoral_matrix_lt.RData". The process is done 100 times (one element per simulation, "Buffer_simulation_lt.RData").

We estimate the attack intensities

We classified the removals into 20 subsets, according to part 2.3.4 of the manuscript ("Variables_lt.RData") (one element per subset)

We perform, for each subset, the kernel estimations with the observed attacks ("Kernel_lt.RData"), with the simulated attacks ("Kernel_simulation_lt.RData") and we correct the first kernel computations with the second ("Kernel_controlled_lt.RData") (one element per subset).

We calculate the trend of attack intensities, for each subset, that compares the total attack intensity before and after the removals (part 2.3.5 of the manuscript), giving "Trends_intensities_df.RData". (one row per subset)

We calculate the trend of attack intensities, for each subset, along the spatial axis, three times, one for each time analysis scale. This gives "Shift_df" (one row per ring and per time analysis scale.

Code 2 : The file Control_removals_p.R contains the control-impact analyses.

It starts with the simulation of 100 removal control sets ("Control_sf_lt_a.RData") from the real set of removals ("Removal_sf_a.RData"), that is done with the function "Control_fn" (l. 92).

The rest of the analyses follows the same process as in the first code "Kernel_wolf_culling_attacks_p.R", in order to apply the before-after analyses to each control set. All objects have the same structure as before, except that they are now a list, with one resulting element per control set. These objects have "control" in their names (not to be confused with "controlled" which refers to the pastoral correction already applied in the first code).

The code is also applied again, from l. 92 to l. 433, this time for the real set of removals (l. 121) - with "Simulated = FALSE" (l. 119). We could not simply use the results from the first code because the set of removals is restricted to removals attributed to mountain ranges only. There are 2 resulting objects: "Kernel_real_lt.RData" (observed real trends) and "Kernel_controlled_real_lt.RData" (real trends corrected for pastoral use).

The part of the code from line 439 to 524 relates to the calculations of the trends (for the real set and the control sets), as in the first code, giving "Trends_intensities_real_df.RData" and "Trends_intensities_control_lt.RData".

The part of the code from line 530 to 588 relates to the calculation of the 95% confidence intervals and the means of the intensity trends for each subset based on the results of the 100 control sets (Trends_intensities_mean_control_df.RData, Trends_intensities_CImin_control_df.RData and Trends_intensities_CImax_control_df.RData). This will be used to test the significativity of the real trends. This comparison is done right after, l. 595-627, and gives the data frame "Trends_comparison_df.RData".

Code 3 : The file Figures.R produces part of the figures from the manuscript:

"Dataset map": figure 1

"Buffer": figure 2 (then pasted in powerpoint)

"Kernel construction": figure 5 (then pasted in powerpoint)

"Trend distributions": figure 7

"Kernels": part of figures 10 and S2

"Attack shifts": figure 9 and S1

"Significant": figure 8
English Business Survey, 2011-2012: Secure Access
beta.ukdataservice.ac.uk
Updated 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Innovation Department For Business (2012). English Business Survey, 2011-2012: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-7113-1
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-7113-1
Dataset updated
2012
Dataset provided by
DataCitehttps://www.datacite.org/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Innovation Department For Business
Description
The English Business Survey (EBS) is commissioned by the Department for Business, Innovation and Skills (BIS) to provide a monthly assessment of business perceptions of current, past and expected economic and business conditions in each English region. A detailed understanding of businesses' perceptions and plans across England will inform the Government's economic growth and rebalancing agenda.

The sample for the EBS is drawn from the Inter-departmental Business Register (IDBR). The primary objective is to achieve a sample that is as close as possible to being proportionate to the employment distribution within England. The EBS is conducted at the level of the workplace (i.e. individual sites within an enterprise, such as a factory, shop or office) rather than at the level of the business or enterprise. The sample is therefore selected at this level as well. The sample of workplaces is selected from across all industry sectors, including public sector and not-for-profit organisations.

Further information on the EBS and monthly statistical releases derived from the survey can be found on the BIS English Business Survey website.

Linking to other business studies
These data contain IDBR reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.

The majority of observations only contain IDBR plant identifiers (LUref). Therefore, it may not be possible to ascertain plants that belong to the same enterprise. Therefore, we recommend that users also apply for access to the Business Structure Database (SN 6697), which will allow users to link observations to their parent enterprises.
d
Statistics of the National soil test database of France
search.dataone.org
doi.pangaea.de
Updated Jan 8, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saby, Nicolas P A; Lemercier, Blandine; Arrouays, Dominique; Leménager, S; Louis, Benjamin P; Millet, Florent; Schellenberger, E; Squividant, H; Swiderski, Chloé; Toutain, Benoît F P; Walter, Christian; Bardy, Marion (2018). Statistics of the National soil test database of France [Dataset]. http://doi.org/10.1594/PANGAEA.831688
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.831688
Dataset updated
Jan 8, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Saby, Nicolas P A; Lemercier, Blandine; Arrouays, Dominique; Leménager, S; Louis, Benjamin P; Millet, Florent; Schellenberger, E; Squividant, H; Swiderski, Chloé; Toutain, Benoît F P; Walter, Christian; Bardy, Marion
Description
In France, farmers commission about 250,000 soil-testing analyses per year to assist them managing soil fertility. The number and diversity of origin of the samples make these analyses an interesting and original information source regarding cultivated topsoil variability. Moreover, these analyses relate to several parameters strongly influenced by human activity (macronutrient contents, pH...), for which existing cartographic information is not very relevant. Compiling the results of these analyses into a database makes it possible to re-use these data within both a national and temporal framework. A database compilation relating to data collected over the period 1990-2009 has been recently achieved. So far, commercial soil-testing laboratories approved by the Ministry of Agriculture have provided analytical results from more than 2,000,000 samples. After the initial quality control stage, analytical results from more than 1,900,000 samples were available in the database. The anonymity of the landholders seeking soil analyses is perfectly preserved, as the only identifying information stored is the location of the nearest administrative city to the sample site. We present in this dataset a set of statistical parameters of the spatial distributions for several agronomic soil properties. These statistical parameters are calculated for 4 different nested spatial entities (administrative areas: e.g. regions, departments, counties and agricultural areas) and for 4 time periods (1990-1994, 1995-1999, 2000-2004, 2005-2009). Two kinds of agronomic soil properties are available: the firs one correspond to the quantitative variables like the organic carbon content and the second one corresponds to the qualitative variables like the texture class. For each spatial unit and temporal period, we calculated the following statistics stets: the first set is calculated for the quantitative variables and corresponds to the number of samples, the mean, the standard deviation and, the 2-,4-,10-quantiles; the second set is calculated for the qualitative variables and corresponds to the number of samples, the value of the dominant class, the number of samples of the dominant class, the second dominant class, the number of samples of the second dominant class.
d
Mental Health Services Monthly Statistics
digital.nhs.uk
Updated Jan 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Mental Health Services Monthly Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics
Explore at:
Dataset updated
Jan 16, 2025
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Dec 1, 2023 - Nov 30, 2024
Description
This publication provides the timeliest picture available of people using NHS funded secondary mental health, learning disabilities and autism services in England, excluding those who are solely in contact with Talking Therapies. This information will be of use to people needing access to information quickly for operational decision making and other purposes. More detailed information on the quality and completeness of these statistics is available in the Data Quality section, as well as within the Data Coverage and Data Quality VODIM and Integrity files available under 'Resources'. Some amendments to methodologies have been made in this publication. Previously, in some metrics, data for Kooth Digital Health Limited was handled in a different way due to them providing online anonymous services. This methodology has been extended to a second provider, MeeToo Education. Additionally, there have been some amendments to the referral spells methodology (metrics starting MRS). These include extending the list of teams in the exclusion and inclusion lists to include depreciated codes and introducing the use of the referral rejection and closure dates where the service discharge date is not available.
UK parents who monitor children's online activity with selected methods 2023...
statista.com
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). UK parents who monitor children's online activity with selected methods 2023 [Dataset]. https://www.statista.com/statistics/1424979/parents-tools-to-monitor-children-internet-usage/
Explore at:
Dataset updated
Oct 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 3, 2023 - Nov 30, 2023
Area covered
United Kingdom
Description
According to a 2023 survey of parents in the United Kingdom, around 30 percent of parents reported restricting access to inappropriate online content using platforms' safety modes. Approximately the same number of respondents reported using parental control built into the device by the manufacturer. UK parents vs. apps As an estimated 96 percent of children's apps available to UK users transmit GPS location or IP address to advertisers and/or data brokers, parents have reason to concern themselves with monitoring their kids' mobile usage. According to research conducted at the beginning of 2023, 70 percent of apps in the Apple App Store and the Google Play Store collected persistent identifiers. Additionally, 15 percent of TikTok users aged between 13 and 17 years had experienced anonymous trolling in the past month, as well as being exposed to sexualized images on the popular short-video platform, according to a survey conducted in the United Kingdom in 2022. Not just monitoring: apps for parents cover several needs In the first half of 2022, there were around 900 apps available to UK children, but only approximately 300 apps designed for parents. Approximately 78 of these were apps to help babies sleep, 72 were apps designed to track babies' rhythms, and only 20 apps available to UK parents were designed to keep and edit babies' photos. Despite their limited availability, baby photo apps for parents generated 674 thousand downloads in the first half of 2022, while fertility tracking apps generated five million downloads among UK users in the same period.
Annual Survey of Hours and Earnings, 1997-2024: Secure Access
beta.ukdataservice.ac.uk
Updated 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Annual Survey of Hours and Earnings, 1997-2024: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-6689-25
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-6689-25
Dataset updated
2025
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Authors
Office for National Statistics
Description
The Annual Survey of Hours and Earnings (ASHE) is one of the largest surveys of the earnings of individuals in the UK. Data on the wages, paid hours of work, and pensions arrangements of nearly one per cent of the working population are collected. Other variables relating to age, occupation and industrial classification are also available. The ASHE sample is drawn from National Insurance records for working individuals, and the survey forms are sent to their respective employers to complete.

While limited in terms of personal characteristics compared to surveys such as the Labour Force Survey, the ASHE is useful not only because of its larger sample size, but also the responses regarding wages and hours are considered to be more accurate, since the responses are provided by employers rather than from employees themselves. A further advantage of the ASHE is that data for the same individuals are collected year after year. It is therefore possible to construct a panel dataset of responses for each individual running back as far as 1997, and to track how occupations, earnings and working hours change for individuals over time. Furthermore, using the unique business identifiers, it is possible to combine ASHE data with data from other business surveys, such as the Annual Business Survey (UK Data Archive SN 7451).

The ASHE replaced the New Earnings Survey (NES, SN 6704) in 2004. NES was developed in the 1970s in response to the policy needs of the time. The survey had changed very little in its thirty-year history. ASHE datasets for the years 1997-2003 were derived using ASHE methodologies applied to NES data.

The ASHE improves on the NES in the following ways:
the NES questionnaire allowed too much variation in employer responses, leading to wide variations in the data
weightings have been introduced to take account of the population size (significant biases were a known problem in NES data)
the significant numbers of employees who change jobs between the sample selection and survey reference dates are retained in the ASHE sample, whereas these were dropped from the NES
Linking to other business studies
These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.

Observations from Northern Ireland
The ASHE data held by the UK Data Archive include very few observations from Northern Ireland. Users requiring access to Northern Ireland data are advised to contact the Northern Ireland Statistics and Research Agency, who administer this aspect of the survey.

Local unit reference variable, luref
The local unit reference variable 'luref', is generated to indicate multiple occurrences of the same local unit for disclosure checking purposes. It is inconsistent across years and is not an IDBR reference number. It should not be used to link ASHE with other business datasets.
For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.

Latest Edition Information
For the twenty-sixth edition (February 2025), the data file 'ashegb_2023r_2024p_pc' has been added, along with the accompanying data dictionary.
4chan posts per 100,000 internet users in selected countries 2020
statista.com
Updated Nov 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). 4chan posts per 100,000 internet users in selected countries 2020 [Dataset]. https://www.statista.com/statistics/1345890/4chan-posts-by-internet-users-in-selected-countries/
Explore at:
Dataset updated
Nov 18, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2020
Area covered
United Kingdom, United States
Description
According to a study conducted in New Zealand in December 2020, the United States had the highest number of 4chan posts per 100 thousand internet users. Overall, North Americans made 2,810 posts per 100 thousand internet users on the anonymous English language based website, and Canadians made 2,728 per 100 thousand users. Additionally, New Zealand and the United Kingdom saw just over 1,500 4chan posts per 100 thousand internet users.
p
Business Activity Survey 2009 - Samoa
microdata.pacificdata.org
Updated Jul 2, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samoa Bureau of Statistics (2019). Business Activity Survey 2009 - Samoa [Dataset]. https://microdata.pacificdata.org/index.php/catalog/253
Explore at:
Dataset updated
Jul 2, 2019
Dataset authored and provided by
Samoa Bureau of Statistics
Time period covered
2009
Area covered
Samoa
Description
Abstract

The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).

The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.

The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.

Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).

A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.

Geographic coverage

National Coverage

Analysis unit

The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.

SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.

It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.

Universe

The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).

Kind of data

Sample survey data [ssd]

Sampling procedure

-Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)

WILL CONFIRM LATER!!

OSO LE MEA E LE FAASA...AEA :-)

Mode of data collection

Mail Questionnaire [mail]

Research instrument

General instructions, authority for the survey, etc;

Business demography information on ownership, contact details, structure, etc.;

Employment;

Income;

Expenses;

Inventories;

Profit or loss and reconciliation to business accounts' profit and loss;

Fixed assets - purchases, disposals, book values

Thank you and signature of respondent.

Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.

Cleaning operations

Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.

Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.

Sampling error estimates

NOT APPLICABLE!!
Annual Respondents Database, 1973-2008: Secure Access
beta.ukdataservice.ac.uk
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (2022). Annual Respondents Database, 1973-2008: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-6644-5
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-6644-5
Dataset updated
2022
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Authors
Office For National Statistics
Description
The Annual Respondents Database (ARD) is constructed from a compulsory business survey. Until 1997 it was created out of the Annual Censuses of Production and Construction (ACOP and ACOC); these were combined into the Annual Business Inquiry (ABI) in 1998. The ARD is a census of large businesses, and a sample of smaller ones. Smaller firms may receive a "short form". These do not require detailed breakdowns of totals. Hence for certain variables the values may be imputed from third party sources or estimated rather than returned by respondents.

This dataset is created for the Economic Analysis and Satellite Accounts Division for research purposes. To create the ARD, the other surveys are converted into a single consistent format linked by the Inter-Departmental Business Register references over time. Northern Ireland data is held up to 2001. From 2002, the ABI is collected and stored separately in Northern Ireland. Special permission is required to use new NI ABI data.

ABI background
The ABI is the financial information survey conducted by the Office for National Statistics (ONS). This is a statutory survey conducted under the Statistics of Trade Act 1947. Organisations are obliged under this legislation to provide a response. Businesses are sampled from the ONS business register current at the time of drawing the sample: first the CSO Business Register, which ran until 1993; then the Inter-Departmental Business Register, which has run from 1994 onwards. The ONS holds firms' responses to the ABI in the Annual Respondents Database (ARD).

The ABI replaced the following annual survey systems in 1998:
Annual Employment Survey (AES)
Annual Censuses of Production and Construction (ACOP/ACOC), which include the Purchases Inquiry (PI)
The six annual Distribution and Services (DSI) inquiries (Annual Wholesale Inquiry; Annual Retail Inquiry; Annual Motor Trades Inquiry; Annual Catering Inquiry; Annual Property Inquiry; and Annual Service Trades Inquiry
Until 1997 the data were limited to the production and construction industries surveyed by the ACOP and ACOC (construction from 1993 only). The incorporation of the DSI inquiries for six additional sectors is reflected in the number of individual business contributors rising from approximately 15,000 for 1980 to 1996 to approximately 50,000 for 1997/98 and to over 70,000 for 1999.

The ABI is one of the most comprehensive surveys undertaken of business organisations in the UK, covering over 100 key economic variables, and approximately two-thirds of the UK economy. Detailed variables for turnover, employment, costs, capital and the derivation of sales and profits are included. A firm-level measure of Gross Value Added (GVA) is also generated so that the productivity of organisations can be evaluated.

The ABI samples UK businesses and other such establishments according to their employment size and industry sector. It is a census of large businesses, and a stratified sample of small and medium sized enterprises. The stratified sampling framework means that smaller firms move in and out of the survey. The forms are customised for industry sectors and sub-sectors. The statistics produced from the sample data are used primarily to assist in the generation of the National Accounts and the measurement of Gross Domestic Product (GDP).

A number of different form-types are used in the survey. Long form-types are sent to all businesses with an employment of 250 or more and also to a proportion of selected businesses with lower employment. Short form-types are sent to the remaining selected businesses. The forms differ in that long form-types ask for a detailed breakdown of purchases; employment costs; taxes, duties and levies etc, whereas short form-types just ask for the totals of these variables.

The data are collected in two parts: Part 1 is an employment record, collected as soon as possible after 12th December. Part 2 is for financial information, which may be submitted up to twelve months after the financial year end.

Geographical references: postcodes
The postcodes available in these data are pseudo-anonymised postcodes. The real postcodes are not available due to the potential risk of identification of the observations. However, these replacement postcodes retain the inherent nested characteristics of real postcodes, and will allow researchers to aggregate observations to other geographic units, e.g. wards, super output areas, etc. In the dataset, the variable of the replacement postcode is 'new_PC'.

Linking to other business studies
These data contain Inter-Departmental Business Register reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.

ARD, the Annual Business Survey (ABS) and the Business Register and Employment Survey (BRES)
The ABI, Part 2 (ABI/2) was replaced by the ABS in 2009. The ABI, Part 1 (ABI/1) was replaced by the BRES in 2009. The BRES data for 2009 onwards are held separately under UK Data Archive SN 7463. ABS data for 2008 onwards are held under UK Data Archive SN 7451. Researchers who are applying for access to the ARD and who require data for 2009 onwards are recommended to also apply for the ABS data under SN 7451.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ori Becher; Mira Marcus-Kalish; David M. Steinberg (2023). Data_Sheet_1_Federated statistical analysis: non-parametric testing and quantile estimation.pdf [Dataset]. http://doi.org/10.3389/fams.2023.1267034.s001

Data_Sheet_1_Federated statistical analysis: non-parametric testing and quantile estimation.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fams.2023.1267034.s001

Dataset updated

Nov 13, 2023

Dataset provided by

Frontiers

Authors

Ori Becher; Mira Marcus-Kalish; David M. Steinberg

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_Federated statistical analysis: non-parametric testing and...

Share of people supporting the anonymization of applications in France...

Data from: Replication package for the paper: "A Study on the Pythonic...

NGL anonymous Q&A app global downloads 2022

Statistics of Access to Materials Posted in the Digital Repository of the...

Rmd code logistic federated.

Dark Web Statistics 2025 By Security, Network, Privacy

Introduction

Eye-Opening Tor Statistics And Facts (2025)

Introduction

UK Innovation Survey, 1994-2023: Secure Access

Data from: Variation in quality of women's health topic information from...

Survey tools for research conducted in the Amazon

Data from: Spatio-temporal dynamics of attacks around deaths of wolves: A...

English Business Survey, 2011-2012: Secure Access

Statistics of the National soil test database of France

Mental Health Services Monthly Statistics

UK parents who monitor children's online activity with selected methods 2023...

Annual Survey of Hours and Earnings, 1997-2024: Secure Access

4chan posts per 100,000 internet users in selected countries 2020

Business Activity Survey 2009 - Samoa

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

Annual Respondents Database, 1973-2008: Secure Access

Data_Sheet_1_Federated statistical analysis: non-parametric testing and quantile estimation.pdf