100+ datasets found
  1. The global Data Classification market size will be USD 1842.2 million in...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jan 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2023). The global Data Classification market size will be USD 1842.2 million in 2024. [Dataset]. https://www.cognitivemarketresearch.com/data-classification-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jan 1, 2023
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Data Classification market size will be USD 1842.2 million in 2024. It will expand at a compound annual growth rate (CAGR) of 25.20% from 2024 to 2031.

    North America held the major market share for more than 40% of the global revenue with a market size of USD 736.88 million in 2024 and will grow at a compound annual growth rate (CAGR) of 23.4% from 2024 to 2031.
    Europe accounted for a market share of over 30% of the global revenue with a market size of USD 552.66 million.
    Asia Pacific held a market share of around 23% of the global revenue with a market size of USD 423.71 million in 2024 and will grow at a compound annual growth rate (CAGR) of 27.2% from 2024 to 2031.
    Latin America had a market share of more than 5% of the global revenue with a market size of USD 92.11 million in 2024 and will grow at a compound annual growth rate (CAGR) of 24.6% from 2024 to 2031.
    Middle East and Africa had a market share of around 2% of the global revenue and was estimated at a market size of USD 36.84 million in 2024 and will grow at a compound annual growth rate (CAGR) of 24.9% from 2024 to 2031.
    The Solutions is the fastest growing segment of the Data Classification industry
    

    Market Dynamics of Data Classification Market

    Key Drivers for Data Classification Market

    Increasing Data Privacy and Security Regulations to Boost Market Growth

    The growing emphasis on statistics privateness and protection rules is using boom inside the records type marketplace. As businesses face stricter compliance requirements and heightened scrutiny over facts managing practices, there's an increasing need for sturdy data category solutions. These answers assist in categorizing and managing records based on their sensitivity and compliance desires, thereby mitigating risks related to records breaches and non-compliance consequences. Enhanced rules, which include GDPR and CCPA, are prompting corporations to spend money on superior records-type technology to shield touchy statistics and make certain adherence to prison standards, for that reason, fueling marketplace enlargement. For instance, In order to assist Indian businesses in consolidating all facets of risk under one roof via integrated risk management technology, Rotiviti India partnered with Riskconnect.

    Expansion of the Data Breaches and Cyberattacks to Drive Market Growth

    The surge in statistics breaches and cyberattacks is significantly boosting the facts category market. As cyber threats become more sophisticated and common, businesses are more and more adopting information classification answers to protect sensitive records. These technologies assist in figuring out, categorizing, and securing facts in line with their sensitivity, thereby minimizing the impact of ability breaches. With cyberattacks concentrated on valuable information and regulatory pressures mounting, agencies are investing in information-type systems to decorate their safety posture and ensure compliance. This developing demand for sturdy facts safety measures is riding the growth of the data category market.

    Restraint Factor for the Data Classification Market

    Complexity and Cost, will Limit Market Growth

    The complexity and cost related to records classification are hindering the market boom. Implementing complete information classification solutions often calls for sizeable investment in advanced technology and professional personnel. The complexity of integrating those systems with present IT infrastructure and ensuring correct classification throughout various records assets provides to the mission. Additionally, ongoing maintenance and updates to hold pace with evolving threats and regulatory adjustments contribute to excessive prices. These factors can be especially burdensome for small and medium-sized organizations, limiting their capability to undertake powerful records class answers and thereby restraining usual marketplace enlargement.

    Impact of Covid-19 on the Data Classification Market

    The COVID-19 pandemic has had a combined effect on the statistics classification market. On the one hand, the improved shift too far-off work and expanded reliance on virtual systems heightened the want for robust statistics classification answers to stable, sensitive records and make sure compliance with data protection policies. On the other hand, economic uncertainties and price range constraints in the course of t...

  2. Data set and classification method for low quality web traffic...

    • zenodo.org
    bin
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Perales; Fernando Perales; Marta Lopez; Marta Lopez (2023). Data set and classification method for low quality web traffic identification in video marketing campaigns [Dataset]. http://doi.org/10.5281/zenodo.7965793
    Explore at:
    binAvailable download formats
    Dataset updated
    May 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fernando Perales; Fernando Perales; Marta Lopez; Marta Lopez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Final outcomes of the InPreVi (AI4Media) project developed in 2022.

    1. Data set describing the statistics of the video ad marketing campaigns

    2. Script for web traffic classification

  3. c

    Parameter estimates of mixed generalized Gaussian distribution for modelling...

    • research-data.cardiff.ac.uk
    zip
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zoe Salinger; Alla Sikorskii; Michael J. Boivin; Nenad Šuvak; Maria Veretennikova; Nikolai N. Leonenko (2024). Parameter estimates of mixed generalized Gaussian distribution for modelling the increments of electroencephalogram data [Dataset]. http://doi.org/10.17035/d.2023.0277307170
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    Cardiff University
    Authors
    Zoe Salinger; Alla Sikorskii; Michael J. Boivin; Nenad Šuvak; Maria Veretennikova; Nikolai N. Leonenko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electroencephalogram (EEG) is used to monitor child's brain during coma by recording data on electrical neural activity of the brain. Signals are captured by multiple electrodes called channels located over the scalp. Statistical analyses of EEG data includes classification and prediction using arrays of EEG features, but few models for the underlying stochastic processes have been proposed. For this purpose, a new strictly stationary strong mixing diffusion model with marginal multimodal (three-peak) distribution (MixGGDiff) and exponentially decaying autocorrelation function for modeling of increments of EEG data was proposed. The increments were treated as discrete-time observations and a diffusion process where the stationary distribution is viewed as a mixture of three non-central generalized Gaussian distributions (MixGGD) was constructed.Probability density function of a mixed generalized Gaussian distribution (MixGGD) consists of three components and is described using a total of 12 parameters:\muk, location parameter of each of the components,sk, shape parameter of each of the components, \sigma2k, parameter related to the scale of each of the components andwk, weight of each of the components, where k, k={1,2,3} refers to theindex of the component of a MixGGD. The parameters of this distribution were estimated using the expectation-maximization algorithm, where the added shape parameter is estimated using the higher order statistics approach based on an analytical relationship between the shape parameter and kurtosis.To illustrate an application of the MixGGDiff to real data, analysis of EEG data collected in Uganda between 2008 and 2015 from 78 children within age-range of 18 months to 12 years who were in coma due to cerebral malaria was performed. EEG were recorded using the International 10–20 system with the sampling rate of 500 Hz and the average record duration of 30 min. EEG signal for every child was the result of a recording from 19 channels. MixGGD was fitted to each channel of every child's recording separately, hence for each channel a total of 12 parameter estimates were obtained. The data is presented in a matrix form (dimension 79*228) in a .csv format and consists of 79 rows where the first row is a header row which contains the names of the variables and the subsequent 78 rows represent parameter estimates of one instance (i.e. one child, without identifiers that could be related back to a specific child). There are a total of 228 columns (19 channels times 12 parameter estimates) where each column represents one parameter estimate of one component of MixGGD in the order of the channels, thus columns 1 to 12 refer to parameter estimates on the first channel, columns 13 to 24 refer to parameter estimates on the second channel and so on. Each variable name starts with "chi" where "ch" is an abbreviation of "channel" and i refers to the order of the channel from EEG recording. The rest of the characters in variable names refer to the parameter estimate names of the components of a MixGGD, thus for example "ch3sigmasq1" refers to the parameter estimate of \sigma2 of the first component of MixGGD obtained from EEG increments on the third channel. Parameter estimates contained in the .csv file are all real numbers within a range of -671.11 and 259326.96.Research results based upon these data are published at https://doi.org/10.1007/s00477-023-02524-y

  4. d

    Data from: Finding Stats: Terms, Tools and Techniques

    • search.dataone.org
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Hamilton (2023). Finding Stats: Terms, Tools and Techniques [Dataset]. http://doi.org/10.5683/SP3/VBEQME
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Elizabeth Hamilton
    Description

    Last year, there was a request for "Deconstructing Terms" found in Statistics Canada products. What do the myriad of terms mean and how can we help our users interpret classification guides, terminology, and the mysteries of Statistics Canada language?

  5. f

    S1 Raw data -

    • figshare.com
    xlsx
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanguang Yuan; Yaodong Wang; Kairan Yang; Yulu Bin (2024). S1 Raw data - [Dataset]. http://doi.org/10.1371/journal.pone.0312801.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Hanguang Yuan; Yaodong Wang; Kairan Yang; Yulu Bin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the Paris Olympic cycle, South Korean women’s athlete An Se-young rose to the top of the 2023 BWF Olympic points with a win rate of 89.5%. With An Se-young as the subject, this paper aims to carry out technical and tactical analysis of women’s badminton singles and formulate a prediction model based on machine learning. Firstly, An’s technical and tactical statistics are analyzed and presented in a proposed "three-stage" data classification method. Secondly, we improve our “three-stage” machine learning dataset using video analysis of 10 matches (21 point games) where An Se-young faced off against four other players ranked in the top five of the World Badminton Federation (BWF) in week 44 of 2023. Finally, we establish a prediction model for the scoring and losing of points in the women’s badminton singles based on the ‘Decision tree’, ‘Random forest’, ‘XGBoost’, ‘Support vector’ and ‘K-proximity’ algorithms, and analyze the effectiveness of this model. The results show that the improved data classification is reasonable and can be used to predict the final score of a match. When the support vector machine uses the RBF function kernel, the accuracy reaches its highest at 87.5%, and the consistency of this prediction model is strong. An’s playstyle is sustained and unified; she does not seek continuous pressure, but rather exploits and maximizes her aggression following any mistake made by her opponents, immediately utilizing assault methods such as kills or dives, often resulting in the conversion of points during the subsequent 2–3 strikes.

  6. f

    Data_Sheet_3_sigFeature: Novel Significant Feature Selection Method for...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pijush Das; Anirban Roychowdhury; Subhadeep Das; Susanta Roychoudhury; Sucheta Tripathy (2023). Data_Sheet_3_sigFeature: Novel Significant Feature Selection Method for Classification of Gene Expression Data Using Support Vector Machine and t Statistic.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00247.s003
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Pijush Das; Anirban Roychowdhury; Subhadeep Das; Susanta Roychoudhury; Sucheta Tripathy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biological data are accumulating at a faster rate, but interpreting them still remains a problem. Classifying biological data into distinct groups is the first step in understanding them. Data classification in response to a certain treatment is an extremely important aspect for differentially expressed genes in making present/absent calls. Many feature selection algorithms have been developed including the support vector machine recursive feature elimination procedure (SVM-RFE) and its variants. Support vector machine RFEs are greedy methods that attempt to find superlative possible combinations leading to binary classification, which may not be biologically significant. To overcome this limitation of SVM-RFE, we propose a novel feature selection algorithm, termed as “sigFeature” (https://bioconductor.org/packages/sigFeature/), based on SVM and t statistic to discover the differentially significant features along with good performance in classification. The “sigFeature” R package is centered around a function called “sigFeature,” which provides automatic selection of features for the binary classification. Using six publicly available microarray data sets (downloaded from Gene Expression Omnibus) with different biological attributes, we further compared the performance of “sigFeature” to three other feature selection algorithms. A small number of selected features (by “sigFeature”) also show higher classification accuracy. For further downstream evaluation of its biological signature, we conducted gene set enrichment analysis with the selected features (genes) from “sigFeature” and compared it with the outputs of other algorithms. We observed that “sigFeature” is able to predict the signature of four out of six microarray data sets accurately, whereas the other algorithms predict less data set signatures. Thus, “sigFeature” is considerably better than related algorithms in discovering differentially significant features from microarray data sets.

  7. Nonemployer Statistics

    • icpsr.umich.edu
    Updated Jun 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States. Bureau of the Census (2015). Nonemployer Statistics [Dataset]. https://www.icpsr.umich.edu/web/NADAC/studies/36218
    Explore at:
    Dataset updated
    Jun 26, 2015
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States. Bureau of the Census
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/36218/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36218/terms

    Area covered
    United States
    Description

    Nonemployer Statistics is an annual series that provides statistics on U.S. businesses with no paid employees or payroll, are subject to federal income taxes, and have receipts of $1,000 or more ($1 or more for the Construction sector). This program is authorized by the United States Code, Titles 13 and 26. Also, the collection provides data for approximately 450 North American Industry Classification System (NAICS) industries at the national, state, county, metropolitan statistical area, and combined statistical area geography levels. The majority of NAICS industries are included with some exceptions as follows: crop and animal production; investment funds, trusts, and other financial vehicles; management of companies and enterprises; and public administration. Data are also presented by Legal Form of Organization (LFO) (U.S. and state only) as filed with the Internal Revenue Service (IRS). Most nonemployers are self-employed individuals operating unincorporated businesses (known as sole proprietorships), which may or may not be the owner's principal source of income. Nonemployers Statistics features nonemployers in several arts-related industries and occupations, including the following: Arts, entertainment, and recreation (NAICS Code 71) Performing arts companies Spectator sports Promoters of performing arts, sports, and similar events Independent artists, writers, and performers Museums, historical sites, and similar institutions Amusement parks and arcades Professional, scientific, and technical services (NAICS Code 54) Architectural services Landscape architectural services Photographic services Retail trade (NAICS Code 44-45) Sporting goods, hobby, and musical instrument stores Sewing, needlework, and piece goods stores Book stores Art dealers Nonemployer Statistics data originate from statistical information obtained through business income tax records that the Internal Revenue Service (IRS) provides to the Census Bureau. The data are processed through various automated and analytical review to eliminate employers from the tabulation, correct and complete data items, remove anomalies, and validate geography coding and industry classification. Prior to publication, the noise infusion method is applied to protect individual businesses from disclosure. Noise infusion was first applied to Nonemployer Statistics in 2005. Prior to 2005, data were suppressed using the complementary cell suppression method. For more information on the coverage and methods used in Nonemployer Statistics, refer to NES Methodology. The majority of all business establishments in the United States are nonemployers, yet these firms average less than 4 percent of all sales and receipts nationally. Due to their small economic impact, these firms are excluded from most other Census Bureau business statistics (the primary exception being the Survey of Business Owners). The Nonemployers Statistics series is the primary resource available to study the scope and activities of nonemployers at a detailed geographic level. For complementary statistics on the firms that do have paid employees, refer to the County Business Patterns. Additional sources of data on small businesses include the Economic Census, and the Statistics of U.S. Businesses. The annual Nonemployer Statistics data are available approximately 18 months after each reference year. Data for years since 2002 are published via comma-delimited format (csv) for spreadsheet or database use, and in the American FactFinder (AFF). For help accessing the data, please refer to the Data User Guide.

  8. r

    State Budget 2021-22 Consolidated government finance statistics...

    • researchdata.edu.au
    Updated Aug 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.vic.gov.au (2021). State Budget 2021-22 Consolidated government finance statistics classification data [Dataset]. https://researchdata.edu.au/state-budget-2021-statistics-classification/1733574
    Explore at:
    Dataset updated
    Aug 6, 2021
    Dataset provided by
    data.vic.gov.au
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data contains general government sector operating expenses, sourced from the Australian Bureau of Statistics historical data and the Department of Treasury and Finance, categorised by ‘government purpose classification’ (GPC) and ‘classification of the functions of government’ (COFOG).\r \r The Australian system of Government Finance Statistics (GFS) was revised by the Australian Bureau of Statistics, with the release of the Australian System of Government Finance Statistics: Concepts, Sources and Methods 2015 Cat. No. 5514.0.\r \r Implementation of the updated GFS manual has resulted in the COFOG framework replacing the former GPC framework, with effect from the 2018-19 financial year for financial reporting under AASB 1049.\r \r The underlying data from 1961-62 to 1997-98 represents a conversion from the original cash series to an accruals basis by estimating depreciation and superannuation expenses based on statistical modelling.\r \r Although the conversion provides a basis for comparison with total expenses in the current series of accrual GFS information from 1998 (in the attached table), the estimated accrued expense items have not been apportioned to individual purpose classifications.\r \r The absence of these splits between functional classifications in the attached table data therefore represents a break in the series and it is not possible to compare individual purpose categories with those in other tables.\r \r Similarly, the transition from GPC to COFOG represents an additional break in the series and comparability between the two frameworks will not be possible.\r \r The key reporting changes from GPC to COFOG are as follows:\r \r - the number of categories has reduced from 12 under GPC to 10 under COFOG; \r - the fuel and energy, agriculture, forestry, fishing and hunting categories have been abolished and are now part of the new economic affairs category. The majority of the outputs in other economic affairs are also included in this new category;\r - public debt transactions have moved from the other purposes category (i.e. primarily interest expense on borrowings) to general public services category;\r - a new environmental protection category was created to include functions such as waste management, water waste management, pollution and production of biodiversity and landscape, which were previously classified under housing and community amenities category, as well as national and state parks functions from the recreation and culture category; and\r - housing functions such as housing assistance and housing concessions are now part of the social protection category

  9. Recognising innovative companies by using a diversified stacked...

    • zenodo.org
    • data.niaid.nih.gov
    csv, pdf
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcin Mirończuk; Marcin Mirończuk; Jarosław Protasiewicz; Jarosław Protasiewicz (2024). Recognising innovative companies by using a diversified stacked generalisation method for website classification – the raw results [Dataset]. http://doi.org/10.5281/zenodo.2537998
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marcin Mirończuk; Marcin Mirończuk; Jarosław Protasiewicz; Jarosław Protasiewicz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    The classification models were trained out by using the Classification and Regression Training package (caret) [1]. The models' parameters were fine-tuned by the 10-fold cross-validation procedure [2].

    Cluster parameters

    Most computations were carried out on a cluster having the following parameters:

    • GPU: NVIDIA Tesla P100;
    • CPU: 2.0 GHz Intel® Xeon® Platinum 8167M;
    • The number of GPUs: 2;
    • The number of CPU cores: 28;
    • The number of CPU threads: 56;
    • RAM: 192 GB;
    • Storage: 3 TB.

    Only one model (k-nn) was calculated on a cluster having the following parameters:

    • Processor: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 3.40 GHz;
    • RAM: 16 GB;
    • Windows 64 bit.

    Performance statistics

    All performance statistics are stored in cvs files. Each file corresponds to a particular machine learning method such as a file, "methodName-stat.csv" contains all data regarding a method, "methodName." All files cover the following columns:

    • dataSetName – a name of a data set on which evaluation was carried out; there are three possible values: (i) firstPages refers to the first data set (LD) that contains textual description of a company; (ii) firstPageLabels refers to the second data set (LL) that involves link labels that were extracted from an index page; (iii) aggregateDocument refers to the third data set (LB) that consists of a so-called big document;
    • fmeasure - the number of features that were taken into account during evaluation;
    • method - the name of function in the caret package;
    • parameters - the values of parameters received from a tuning phase of a given classification method;
    • precision – the value of method’s precision;
    • recall – the value of method’s recall;
    • fmeasure - the value of method’s F-measure;
    • error - the value of method’s error;
    • acc – the value of method’s.

    Time processing statistics

    All time processing statistics, like the performance statistics, are stored in cvs files. Each file corresponds to a particular machine learning method such as a file, "methodName-time.csv". All files cover the following columns:

    • dataSetName – a name of a data set on which evaluation was carried out; there are three possible values: (i) firstPages refers to the first data set (LD) that contains textual description of a company; (ii) firstPageLabels refers to the second data set (LL) that involves link labels that were extracted from an index page; (iii) aggregateDocument refers to the third data set (LB) that consists of a so-called big document;
    • featureNo - the number of features that were taken into account during evaluation;
    • method - the name of function in the caret package;
    • user - user time elapsed for executing a method as an R process;
    • system - system time elapsed for executing a method as an R process;
    • elapsed - total time elapsed for executing a method as an R process.

    For more information about user, system and total elapsed time, please see documentation [3].

    References

    [1] https://cran.r-project.org/web/packages/caret/

    [2] https://topepo.github.io/caret/model-training-and-tuning.html

    [3] https://stat.ethz.ch/R-manual/R-devel/library/base/html/proc.time.htm

  10. f

    Rates of classification errors of the methods on the AR database (%).

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qi Zhu; Zhengming Li; Jinxing Liu; Zizhu Fan; Lei Yu; Yan Chen (2023). Rates of classification errors of the methods on the AR database (%). [Dataset]. http://doi.org/10.1371/journal.pone.0070370.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Qi Zhu; Zhengming Li; Jinxing Liu; Zizhu Fan; Lei Yu; Yan Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rates of classification errors of the methods on the AR database (%).

  11. o

    Inferring health conditions from fMRI-graph data

    • osf.io
    Updated Nov 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PierGianLuca Porta Mana; Claudia Bachmann; Abigail Morrison (2021). Inferring health conditions from fMRI-graph data [Dataset]. http://doi.org/10.17605/OSF.IO/84K9A
    Explore at:
    Dataset updated
    Nov 22, 2021
    Dataset provided by
    Center For Open Science
    Authors
    PierGianLuca Porta Mana; Claudia Bachmann; Abigail Morrison
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Automated classification methods for disease diagnosis are currently in the limelight, especially for imaging data. Classification does not fully meet a clinician's needs, however: in order to combine the results of multiple tests and decide on a course of treatment, a clinician needs the likelihood of a given health condition rather than binary classification yielded by such methods. We illustrate how likelihoods can be derived step by step from first principles and approximations, and how they can be assessed and selected, using fMRI data from a publicly available data set containing schizophrenic and healthy control subjects, as a working example. We start from the basic assumption of partial exchangeability, and then the notion of sufficient statistics and the "method of translation" (Edgeworth, 1898) combined with conjugate priors. This method can be used to construct a likelihood that can be used to compare different data-reduction algorithms. Despite the simplifications and possibly unrealistic assumptions used to illustrate the method, we obtain classification results comparable to previous, more realistic studies about schizophrenia, whilst yielding likelihoods that can naturally be combined with the results of other diagnostic tests.

  12. e

    Night population by SNI 2007, 2008-2018 — Statistics for Malmö areas

    • data.europa.eu
    csv, json
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malmö stad (2025). Night population by SNI 2007, 2008-2018 — Statistics for Malmö areas [Dataset]. https://data.europa.eu/data/datasets/https-ckan-malmo-dataplatform-se-dataset-e89514ce-529d-4240-b000-7b508aacd1b4?locale=en
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 2, 2025
    Dataset authored and provided by
    Malmö stad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Malmö
    Description

    Shows in which industry the employed persons living in the area work. The breakdown follows the Standard for Swedish Industrial Classification (SNI) 2007. Categories 0 (unknown), A (Agriculture, hunting and forestry), B (extraction of materials), D (supply of electricity, gas, heating and cooling), E (water supply, etc.), T (household production, etc.) and U (activities of international organisations) have been merged into the category “Others” due to the low number of workers in these industries. The variable also shows the total number of workers living in the area. Only people who work in Sweden are included. The variable is divided by gender. Shown to the population aged 16-74. Based on statistics from Statistics Sweden RAMS (“Register-based labour market statistics”). The statistics include all individuals between 16 and 74 who are registered in Sweden on 31 December. As of 2011, Statistics Sweden has made some changes in how to count people aged 65 or over to get a more consistent assessment of self-employed persons. It is therefore not appropriate to compare statistics further back in time than in 2011. As of the reference year 2019, Statistics Sweden is using a new data source and method for classifying workers in RAMS. The change of source and method means that comparisons of statistics for 2019 and previous reference years must be made with great care. Read more at SCB.se

    In this file there are statistics for a number of variables broken down by Malmö’s different areas over time.

    Source

    Unless otherwise stated, the statistics in this database are retrieved from Statistics Sweden’s (SCB) regional database, Skånedatabasen or from Statistics Sweden’s area statistics database (OSDB). The Skåne database and OSDB show data from several different sources that Statistics Sweden has compiled on a geographical level. The statistics only cover persons who are part of the population registered in the population. Therefore, persons without a residence permit, such as asylum seekers, and persons who simply have not registered in the municipality are not included. Statistics Sweden does not provide statistics on which language residents speak, which religion you belong to or what ethnicity or political views you have. Therefore, such data is not available here either. However, the Electoral Authority reports election results per constituency on its website val.se. There are statistics from the last election as well as several previous elections available. Please note, however, that the constituencies do not necessarily follow the division of the city made here.

    Update

    The data is updated every spring as Statistics Sweden releases the figures to the municipality. Most variables are available for the year before. However, income and employment data are released with another year’s backlog. Unless otherwise stated, the date of measurement is 31 December of each year.

    Geographical breakdown

    Unless otherwise stated, the data is available for Malmö as a whole and broken down into urban areas (5 pieces), districts (10 pieces) and subareas (136 pieces). In addition to these, there is a residual post that contains the people who are not written in a specific place in the municipality, have protected identity and more. These people are also part of the total. In several of the subareas there are no or only a few registered population registers. Therefore, no data are reported for these areas. Examples of such sub-areas are parks such as Pildammsparken and Kroksbäcksparken and industrial areas such as Fosieby Industriområde and Spillepengen.

    Privacy clearance

    In order to protect the identity of individuals, the data is confidentially audited. This means that small values are suppressed, i.e. replaced by empty cells. However, the values are included in summaries. In general, the following rules apply:

    • No statistics are reported for geographical areas with very few housing.
    • No cells with fewer than 5 individuals are reported. For data classified as sensitive (e.g. income and country of birth), larger values can also be suppressed.
    • In cases where a subcategory (e.g. a training category) is too small to be accounted for, all categories are often suppressed.

    API

    With the help of the API call https://ckan-malmo.dataplatform.se/api/3/action/resource_search?query=description:malm%C3%B6%20statistik, you get in JSON format all datasets that contain statistical data for Malmö’s areas. In each instance of result/results in the JSON result, for each resource there is an “id” property. The value of “id” can be used to retrieve the data for the respective statistical variable (according to the value of the property “name”). API calls to retrieve the statistical data: * https://ckan-malmo.dataplatform.se/api/action/datastore_search?resource_id=[id-värdet]*

    Please use the numbers, but use “City Office, Malmö City” as the source.

  13. f

    Data from: Statistics in Proteomics: A Meta-analysis of 100 Proteomics...

    • acs.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David C. L. Handler; Paul A. Haynes (2023). Statistics in Proteomics: A Meta-analysis of 100 Proteomics Papers Published in 2019 [Dataset]. http://doi.org/10.1021/jasms.9b00142.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    David C. L. Handler; Paul A. Haynes
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We randomly selected 100 journal articles published in five proteomics journals in 2019 and manually examined each of them against a set of 13 criteria concerning the statistical analyses used, all of which were based on items mentioned in the journals’ instructions to authors. This included questions such as whether a pilot study was conducted and whether false discovery rate calculation was employed at either the quantitation or identification stage. These data were then transformed to binary inputs, analyzed via machine learning algorithms, and classified accordingly, with the aim of determining if clusters of data existed for specific journals or if certain statistical measures correlated with each other. We applied a variety of classification methods including principal component analysis decomposition, agglomerative clustering, and multinomial and Bernoulli naïve Bayes classification and found that none of these could readily determine journal identity given extracted statistical features. Logistic regression was useful in determining high correlative potential between statistical features such as false discovery rate criteria and multiple testing corrections methods, but was similarly ineffective at determining correlations between statistical features and specific journals. This meta-analysis highlights that there is a very wide variety of approaches being used in statistical analysis of proteomics data, many of which do not conform to published journal guidelines, and that contrary to implicit assumptions in the field there are no clear correlations between statistical methods and specific journals.

  14. f

    Distribution and Epidemiological Characteristics of Published Individual...

    • plos.figshare.com
    doc
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yafang Huang; Chen Mao; Jinqiu Yuan; Zuyao Yang; Mengyang Di; Wilson Wai-san Tam; Jinling Tang (2023). Distribution and Epidemiological Characteristics of Published Individual Patient Data Meta-Analyses [Dataset]. http://doi.org/10.1371/journal.pone.0100151
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yafang Huang; Chen Mao; Jinqiu Yuan; Zuyao Yang; Mengyang Di; Wilson Wai-san Tam; Jinling Tang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIndividual patient data meta-analyses (IPDMAs) prevail as the gold standard in clinical evaluations. We investigated the distribution and epidemiological characteristics of published IPDMA articles.Methodology/Principal FindingsIPDMA articles were identified through comprehensive literature searches from PubMed, Embase, and Cochrane library. Two investigators independently conducted article identification, data classification and extraction. Data related to the article characteristics were collected and analyzed descriptively. A total of 829 IPDMA articles indexed until 9 August 2012 were identified. An average of 3.7 IPDMA articles was published per year. Malignant neoplasms (267 [32.2%]) and circulatory diseases (179 [21.6%]) were the most frequently occurring topics. On average, each IPDMA article included a median of 8 studies (Interquartile range, IQR 5 to 15) involving 2,563 patients (IQR 927 to 8,349). Among 829 IPDMA articles, 229 (27.6%) did not perform a systematic search to identify related studies. In total, 207 (25.0%) sought and included individual patient data (IPD) from the “grey literature”. Only 496 (59.8%) successfully obtained IPD from all identified studies.Conclusions/SignificanceThe number of IPDMA articles exhibited an increasing trend over the past few years and mainly focused on cancer and circulatory diseases. Our data indicated that literature searches, including grey literature and data availability were inconsistent among different IPDMA articles. Possible biases may arise. Thus, decision makers should not uncritically accept all IPDMAs.

  15. 2023 Census main means of travel to education by statistical area 3

    • datafinder.stats.govt.nz
    csv, dbf (dbase iii) +4
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2025). 2023 Census main means of travel to education by statistical area 3 [Dataset]. https://datafinder.stats.govt.nz/table/122495-2023-census-main-means-of-travel-to-education-by-statistical-area-3/
    Explore at:
    csv, geopackage / sqlite, dbf (dbase iii), mapinfo tab, mapinfo mif, geodatabaseAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Description

    Dataset shows an individual’s statistical area 3 (SA3) of usual residence and the SA3 of their place of study, for the census usually resident population count who are studying (part time or full time), by main means of travel to education from the 2018 and 2023 Censuses.

    The main means of travel to education categories are:

    • Study at home
    • Drive a car, truck, or van
    • Passenger in a car, truck, or van
    • Bicycle
    • Walk or jog
    • School bus
    • Public bus
    • Train
    • Ferry
    • Other.

    Main means of travel to education is the usual method a person used to travel the longest distance to their place of study.

    Educational institution address is the physical location of the individual’s place of study. Educational institutions include early childhood education, primary school, secondary school, and tertiary education institutions. For individuals who study at home, their educational institution address is the same as their usual residence address.

    Educational institution address is coded to the most detailed geography possible from the available information. This dataset only includes travel to education information for individuals whose educational institution address is available at SA3 level. The sum of the counts for each region in this dataset may not equal the census usually resident population count who are studying (part time or full time) for that region. Educational institution address – 2023 Census: Information by concept has more information.

    This dataset can be used in conjunction with the following spatial files by joining on the SA3 code values:

    Download data table using the instructions in the Koordinates help guide.

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city. 

    Population counts

    Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts. 

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data).

    Educational institution address time series

    Educational institution address time series data should be interpreted with care at lower geographic levels, such as statistical area 2 (SA2). Methodological improvements in 2023 Census resulted in greater data accuracy, including a greater proportion of people being counted at lower geographic areas compared to the 2018 Census. Educational institution address – 2023 Census: Information by concept has more information.

    Rows excluded from the dataset

    Rows show SA3 of usual residence by SA3 of educational institution address. Rows with a total population count of less than six have been removed to reduce the size of the dataset, given only a small proportion of SA3-SA3 combinations have commuter flows.

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Quality rating of a variable

    The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.

    Main means of travel to education quality rating

    Main means of travel to education is rated as moderate quality.

    Main means of travel to education – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Educational institution address quality rating

    Educational institution address is rated as moderate quality.

    Educational institution address – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Percentages

    To calculate percentages, divide the figure for the category of interest by the figure for ‘Total stated’ where this applies.

    Symbol

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

  16. f

    Rates of classification errors of the methods on the ORL database (%).

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qi Zhu; Zhengming Li; Jinxing Liu; Zizhu Fan; Lei Yu; Yan Chen (2023). Rates of classification errors of the methods on the ORL database (%). [Dataset]. http://doi.org/10.1371/journal.pone.0070370.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Qi Zhu; Zhengming Li; Jinxing Liu; Zizhu Fan; Lei Yu; Yan Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rates of classification errors of the methods on the ORL database (%).

  17. Data from: NUTS 3

    • inspire-geoportal.ec.europa.eu
    Updated Apr 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2022). NUTS 3 [Dataset]. https://inspire-geoportal.ec.europa.eu/srv/api/records/%7B38C87F49-78F9-48D1-A599-369E42F19FBE%7D
    Explore at:
    atom syndication formatAvailable download formats
    Dataset updated
    Apr 3, 2022
    Dataset provided by
    Central Statistics Office Irelandhttps://www.cso.ie/en/
    Authors
    Central Statistics Office
    License

    http://inspire.ec.europa.eu/metadata-codelist/ConditionsApplyingToAccessAndUse/noConditionsApplyhttp://inspire.ec.europa.eu/metadata-codelist/ConditionsApplyingToAccessAndUse/noConditionsApply

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations

    Area covered
    Description

    The Nomenclature of Territorial Units for Statistics (NUTS) were drawn up by Eurostat in order to define territorial units for the production of comparable regional statistics across the European Union. The NUTS classification has been used in EU legislation since 1988, but it was only in 2003 that the EU Member States, the European Parliament and the Commission established the NUTS regions within a legal framework (Regulation (EC) No 1059/2003). For the purpose of comparability, NUTS classifications remain stable for at least three years, as specified by the regulation. However, if national interests require a change in the regional breakdown of a country, the country concerned may notify the European Commission, who will then carry out the required amendments at the end of a period of stability, according to the rules of the NUTS regulation. Changes made under the 2014 Local Government Act prompted a revision to the Irish NUTS 2 and NUTS 3 Regions. The NUTS 3 boundaries were amended on 21st of November 2016 and have been given legal status under the Commission Regulation (Regulation (EC) No 2066/2016). The main changes at NUTS 3 level are the transfer of South Tipperary from the South-East into the Mid-West NUTS 3 region and the movement of Louth from the Border to the Mid-East NUTS 3 Region. The NUTS 3 boundaries are available as generalised (20/50/100m) and ungeneralised. The NUTS coding scheme and naming structure were decided by Eurostat.The new NUTS 2 and NUTS 3 structure and classification are displayed in a table on the CSO website https://www.cso.ie/en/methods/revnuts23/

  18. 2023 Census main means of travel to work by statistical area 3

    • datafinder.stats.govt.nz
    csv, dbf (dbase iii) +4
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2025). 2023 Census main means of travel to work by statistical area 3 [Dataset]. https://datafinder.stats.govt.nz/table/122496-2023-census-main-means-of-travel-to-work-by-statistical-area-3/
    Explore at:
    mapinfo mif, csv, dbf (dbase iii), geodatabase, mapinfo tab, geopackage / sqliteAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Description

    Dataset shows an individual’s statistical area 3 (SA3) of usual residence and the SA3 of their workplace address, for the employed census usually resident population count aged 15 years and over, by main means of travel to work from the 2018 and 2023 Censuses.

    The main means of travel to work categories are:

    • Work at home
    • Drive a private car, truck, or van
    • Drive a company car, truck, or van
    • Passenger in a car, truck, van, or company bus
    • Public bus
    • Train
    • Bicycle
    • Walk or jog
    • Ferry
    • Other.

    Main means of travel to work is the usual method which an employed person aged 15 years and over used to travel the longest distance to their place of work.

    Workplace address refers to where someone usually works in their main job, that is the job in which they worked the most hours. For people who work at home, this is the same address as their usual residence address. For people who do not work at home, this could be the address of the business they work for or another address, such as a building site.

    Workplace address is coded to the most detailed geography possible from the available information. This dataset only includes travel to work information for individuals whose workplace address is available at SA3 level. The sum of the counts for each region in this dataset may not equal the total employed census usually resident population count aged 15 years and over for that region. Workplace address – 2023 Census: Information by concept has more information.

    This dataset can be used in conjunction with the following spatial files by joining on the SA3 code values:

    Download data table using the instructions in the Koordinates help guide.

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city. 

    Population counts

    Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts. 

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data).

    Workplace address time series

    Workplace address time series data should be interpreted with care at lower geographic levels, such as statistical area 2 (SA2). Methodological improvements in 2023 Census resulted in greater data accuracy, including a greater proportion of people being counted at lower geographic areas compared to the 2018 Census. Workplace address – 2023 Census: Information by concept has more information.

    Working at home

    In the census, working at home captures both remote work, and people whose business is at their home address (e.g. farmers or small business owners operating from their home). The census asks respondents whether they ‘mostly’ work at home or away from home. It does not capture whether someone does both, or how frequently they do one or the other.

    Rows excluded from the dataset

    Rows show SA3 of usual residence by SA3 of workplace address. Rows with a total population count of less than six have been removed to reduce the size of the dataset, given only a small proportion of SA3-SA3 combinations have commuter flows.

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Quality rating of a variable

    The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.

    Main means of travel to work quality rating

    Main means of travel to work is rated as moderate quality.

    Main means of travel to work – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Workplace address quality rating

    Workplace address is rated as moderate quality.

    Workplace address – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Percentages

    To calculate percentages, divide the figure for the category of interest by the figure for ‘Total stated’ where this applies.

    Symbol

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

  19. s

    Rural Urban Classification (2021) of Local Authority Districts (2024) in EW

    • geoportal.statistics.gov.uk
    • data.europa.eu
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). Rural Urban Classification (2021) of Local Authority Districts (2024) in EW [Dataset]. https://geoportal.statistics.gov.uk/datasets/rural-urban-classification-2021-of-local-authority-districts-2024-in-ew
    Explore at:
    Dataset updated
    Mar 5, 2025
    Dataset authored and provided by
    Office for National Statistics
    Area covered
    Description

    Rural Urban ClassificationThe 2021 RUC is a statistical classification to provide a consistent and standardised method for classifying geographies as rural or urban. This is based on address density, physical settlement form, population size, and Relative Access to Major towns and cities (populations of over 75,000 people). The classification is produced by the Office for National Statistics (ONS) with advice from the Department for Environment, Food and Rural Affairs (Defra), the Welsh Government and colleagues from the Government Geography Profession (GGP).This is 2021 rural-urban classification (RUC) of 2024 Local Authority Districts in England and Wales. This means that the 2021 RUC methodology has been applied to the 2024 LAD boundaries. LAD classifications are divided into four categories based on their populations:1. Majority Rural: had at least 50% of their population residing in Rural OAs2. Intermediate Rural: 35-50% rural population3. Intermediate Urban: 20-35% rural population4. Urban: 20% or less of the population lived in rural OAs.Each 2024 LAD category is split into one of two Relative Access categories, using the same data as the 2021 Output Area RUC. If more than 50% of a LAD population lives in ‘Nearer a major town or city’ OAs, it is deemed ‘nearer a major town or city’; otherwise, it is classified as ‘further from a major town or city’.

    Where data is unavailable for Super Output Area geographies, it may be appropriate for users to undertake analysis at the LAD level. At this level, the categorisation works slightly differently in that most areas will include a mix of both rural and urban areas - so the LA RUC categorisation is a reflection of this. A statistical geography may contain substantial portions of open countryside but still be given an ‘Urban’ classification if the majority of the population within the area live in settlements that are urban in nature. Users should take this into consideration to ensure correct interpretations of any analysis of RUC LAD categories.

  20. Rural statistics local level data on tourism Gross Value Added (GVA), 2001...

    • gov.uk
    Updated Aug 8, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Environment, Food & Rural Affairs (2013). Rural statistics local level data on tourism Gross Value Added (GVA), 2001 to 2010 [Dataset]. https://www.gov.uk/government/statistical-data-sets/rural-statistics-local-level-data-on-tourism-gva
    Explore at:
    Dataset updated
    Aug 8, 2013
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Environment, Food & Rural Affairs
    Description

    Gross Value Added (GVA) of tourism related industries by Local Authority Rural-Urban Classification, region and local authority for the period from 2001 to 2010.

    GVA measures the contribution to the economy of each individual producer, industry or sector. These statistics are produced for tourism activity in England.

    The definition of tourism used is the http://www.ons.gov.uk/ons/guide-method/method-quality/specific/economy/economic-value-of-tourism/measuring-tourism-locally/2012/note-1/index.html" class="govuk-link">UN WTO definition.

    Source: Office for National Statistics, Annual Business Survey (ABS) and Local Authority Rural-Urban Classification.

    https://assets.publishing.service.gov.uk/media/5a7c7193e5274a5255bceae8/Tourism_NUTS4_Final_v3.xlsx">Tourism GVA

    MS Excel Spreadsheet, 273 KB

    This file may not be suitable for users of assistive technology.

    Request an accessible format.
    If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email defra.helpline@defra.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

    Defra statistics: rural

    Email mailto:rural.statistics@defra.gov.uk">rural.statistics@defra.gov.uk

    <p class="govuk-body">You can also contact us via Twitter: <a href="https://twitter.com/DefraStats" class="govuk-link">https://twitter.com/DefraStats</a></p>
    

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Cognitive Market Research (2023). The global Data Classification market size will be USD 1842.2 million in 2024. [Dataset]. https://www.cognitivemarketresearch.com/data-classification-market-report
Organization logo

The global Data Classification market size will be USD 1842.2 million in 2024.

Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jan 1, 2023
Dataset authored and provided by
Cognitive Market Research
License

https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

Time period covered
2021 - 2033
Area covered
Global
Description

According to Cognitive Market Research, the global Data Classification market size will be USD 1842.2 million in 2024. It will expand at a compound annual growth rate (CAGR) of 25.20% from 2024 to 2031.

North America held the major market share for more than 40% of the global revenue with a market size of USD 736.88 million in 2024 and will grow at a compound annual growth rate (CAGR) of 23.4% from 2024 to 2031.
Europe accounted for a market share of over 30% of the global revenue with a market size of USD 552.66 million.
Asia Pacific held a market share of around 23% of the global revenue with a market size of USD 423.71 million in 2024 and will grow at a compound annual growth rate (CAGR) of 27.2% from 2024 to 2031.
Latin America had a market share of more than 5% of the global revenue with a market size of USD 92.11 million in 2024 and will grow at a compound annual growth rate (CAGR) of 24.6% from 2024 to 2031.
Middle East and Africa had a market share of around 2% of the global revenue and was estimated at a market size of USD 36.84 million in 2024 and will grow at a compound annual growth rate (CAGR) of 24.9% from 2024 to 2031.
The Solutions is the fastest growing segment of the Data Classification industry

Market Dynamics of Data Classification Market

Key Drivers for Data Classification Market

Increasing Data Privacy and Security Regulations to Boost Market Growth

The growing emphasis on statistics privateness and protection rules is using boom inside the records type marketplace. As businesses face stricter compliance requirements and heightened scrutiny over facts managing practices, there's an increasing need for sturdy data category solutions. These answers assist in categorizing and managing records based on their sensitivity and compliance desires, thereby mitigating risks related to records breaches and non-compliance consequences. Enhanced rules, which include GDPR and CCPA, are prompting corporations to spend money on superior records-type technology to shield touchy statistics and make certain adherence to prison standards, for that reason, fueling marketplace enlargement. For instance, In order to assist Indian businesses in consolidating all facets of risk under one roof via integrated risk management technology, Rotiviti India partnered with Riskconnect.

Expansion of the Data Breaches and Cyberattacks to Drive Market Growth

The surge in statistics breaches and cyberattacks is significantly boosting the facts category market. As cyber threats become more sophisticated and common, businesses are more and more adopting information classification answers to protect sensitive records. These technologies assist in figuring out, categorizing, and securing facts in line with their sensitivity, thereby minimizing the impact of ability breaches. With cyberattacks concentrated on valuable information and regulatory pressures mounting, agencies are investing in information-type systems to decorate their safety posture and ensure compliance. This developing demand for sturdy facts safety measures is riding the growth of the data category market.

Restraint Factor for the Data Classification Market

Complexity and Cost, will Limit Market Growth

The complexity and cost related to records classification are hindering the market boom. Implementing complete information classification solutions often calls for sizeable investment in advanced technology and professional personnel. The complexity of integrating those systems with present IT infrastructure and ensuring correct classification throughout various records assets provides to the mission. Additionally, ongoing maintenance and updates to hold pace with evolving threats and regulatory adjustments contribute to excessive prices. These factors can be especially burdensome for small and medium-sized organizations, limiting their capability to undertake powerful records class answers and thereby restraining usual marketplace enlargement.

Impact of Covid-19 on the Data Classification Market

The COVID-19 pandemic has had a combined effect on the statistics classification market. On the one hand, the improved shift too far-off work and expanded reliance on virtual systems heightened the want for robust statistics classification answers to stable, sensitive records and make sure compliance with data protection policies. On the other hand, economic uncertainties and price range constraints in the course of t...

Search
Clear search
Close search
Google apps
Main menu