The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.
The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).
The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.
A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.
Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The numbers represent the pooled data for each part-of-speech (POS) category across tasks and conditions, with the numbers in parentheses indicating the corresponding percentage (Blocks A = with a non-explicit goal to build with blocks; Blocks B = with an explicit goal to build a tower together; Blicket = with an explicit goal to teach the infants a novel noun; CV = co-view’ DG = digital-view
The product data are six statistics that were estimated for the chemical concentration of lithium in the soil C horizon of the conterminous United States. The estimates are made at 9998 locations that are uniformly distributed across the conterminous United States. The six statistics are the mean for the isometric log-ratio transform of the concentrations, the equivalent mean for the concentrations, the standard deviation for the isometric log-ratio transform of the concentrations, the probability of exceeding a concentration of 55 milligrams per kilogram, the 0.95 quantile for the isometric log-ratio transform of the concentrations, and the equivalent 0.95 quantile for the concentrations. Each statistic may be used to generate a statistical map that shows an attribute of the distribution of lithium concentration.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is the definitive version of the annually released statistical area 1 (SA1) boundaries as at 1 January 2025, as defined by Stats NZ. This version contains 33,164 SA1s (33,148 digitised and 16 with empty or null geometries (non-digitised)). SA1 is an output geography that allows the release of more low-level data than is available at the meshblock level. Built by joining meshblocks, SA1s have an ideal size range of 100–200 residents, and a maximum population of about 500. This is to minimise suppression of population data in multivariate statistics tables. The SA1 should: form a contiguous cluster of one or more meshblocks, be either urban, rural, or water in character, be small enough to: allow flexibility for aggregation to other statistical geographies, allow users to aggregate areas into their own defined communities of interest, form a nested hierarchy with statistical output geographies and administrative boundaries. It must: be built from meshblocks, either define or aggregate to define SA2s, urban rural areas, territorial authorities, and regional councils. SA1s generally have a population of 100–200 residents, with some exceptions: SA1s with nil or nominal resident populations are created to represent remote mainland areas, unpopulated islands, inland water, inlets, or oceanic areas. Some SA1s in remote rural areas and urban industrial or business areas have fewer than 100 residents. Some SA1s that contain apartment blocks, retirement villages, and large non-residential facilities (prisons, boarding schools, etc.) have more than 500 residents. SA1 numbering SA1s are not named. SA1 codes have seven digits starting with a 7 and are numbered approximately north to south. Non-digitised codes start with 79. As new SA1s are created, they are given the next available numeric code. If the composition of an SA1 changes through splitting or amalgamating different meshblocks, the SA1 is given a new code. The previous code no longer exists within that version and future versions of the SA1 classification. Digitised and non-digitised SA1s The digital geographic boundaries are defined and maintained by Stats NZ. Aggregated from meshblocks, SA1s cover the land area of New Zealand, the water area to the 12-mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, off-shore oil rigs, and Ross Dependency. The following 16 SA1s are held in non-digitised form. 7999901; New Zealand Economic Zone, 7999902; Oceanic Kermadec Islands,7999903; Kermadec Islands, 7999904; Oceanic Oil Rig Taranaki,7999905; Oceanic Campbell Island, 7999906; Campbell Island, 7999907; Oceanic Oil Rig Southland, 7999908; Oceanic Auckland Islands, 7999909; Auckland Islands, 7999910; Oceanic Bounty Islands, 7999911; Bounty Islands, 7999912; Oceanic Snares Islands, 7999913; Snares Islands, 7999914; Oceanic Antipodes Islands, 7999915; Antipodes Islands, 7999916; Ross Dependency. High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. Digital data Digital boundary data became freely available on 1 July 2007. Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The de-identified data from the 2013-14 individual 2% sample file (https://data.gov.au/dataset/taxation-statistics-individual-sample-files) has been aggregated to the following levels: Sex Age (5 year ranges) Occupation (1 digit level) Partner Status Location (SA4 Region name) Lodgment channel (Agent or self-preparer) PHI indicator. Data was then added from the ABS Census (2011), and ABS SEIFA, in summary variables, or ranked variables to SA4 regions. The Geelong Region was extracted from the full dataset Although all due care has been taken to ensure that these data are correct, no warranty is expressed or implied by the City of Greater Geelong in their use.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/RCHDXXhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/RCHDXX
This dataset contains replication files for "A Practical Method to Reduce Privacy Loss when Disclosing Statistics Based on Small Samples" by Raj Chetty and John Friedman. For more information, see https://opportunityinsights.org/paper/differential-privacy/. A summary of the related publication follows. Releasing statistics based on small samples – such as estimates of social mobility by Census tract, as in the Opportunity Atlas – is very valuable for policy but can potentially create privacy risks by unintentionally disclosing information about specific individuals. To mitigate such risks, we worked with researchers at the Harvard Privacy Tools Project and Census Bureau staff to develop practical methods of reducing the risks of privacy loss when releasing such data. This paper describes the methods that we developed, which can be applied to disclose any statistic of interest that is estimated using a sample with a small number of observations. We focus on the case where the dataset can be broken into many groups (“cells”) and one is interested in releasing statistics for one or more of these cells. Building on ideas from the differential privacy literature, we add noise to the statistic of interest in proportion to the statistic’s maximum observed sensitivity, defined as the maximum change in the statistic from adding or removing a single observation across all the cells in the data. Intuitively, our approach permits the release of statistics in arbitrarily small samples by adding sufficient noise to the estimates to protect privacy. Although our method does not offer a formal privacy guarantee, it generally outperforms widely used methods of disclosure limitation such as count-based cell suppression both in terms of privacy loss and statistical bias. We illustrate how the method can be implemented by discussing how it was used to release estimates of social mobility by Census tract in the Opportunity Atlas. We also provide a step-by-step guide and illustrative Stata code to implement our approach.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Refer to the current geographies boundaries table for a list of all current geographies and recent updates. This dataset is the definitive version of the annually released statistical area 3 boundaries as at 1 January 2025, defined by Stats NZ and concorded to higher geographies. This version contains 929 statistical 3 areas (925 digitised and 4 with empty or null geometries (non-digitised)). Statistical area 3 (SA3) is a new output geography, introduced in 2023, that allows aggregations of population data between the SA3geography and territorial authority geography. This dataset is the definitive version of statistical area 3 (SA3) boundaries concorded to higher geographies for 2025 as defined by Stats NZ. This version contains 929 SA3s. This statistical area 3 higher geographies file is a correspondence, or concordance, which relates SA3s to larger geographic areas or 'higher geographies'. The higher geography contained in this concordance is: territorial authority (TA). High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. Digital data Digital boundary data became freely available on 1 July 2007. Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:
The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)
*The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.
The Diagnosis-Related Groups Statistic (DRG) is an anual complete survey of all fully stationary hospital cases in Germany that were accounted for by case rates. The microdata can be requested via the RDC starting from the survey year 2005.The case rate based DRG accounting system was introduced when the hospital financing was amended in 2000. Starting from 1 January 2004, this accounting system is obligatory for general hospitals.The DRG Statistic is a secundary statistic. The data is not collected by the statistical offices of the Federation and the federal states but by the Institut für das Entgeltsystem im Krankenhaus (InEK). The data for the DRG Statistic are taken from the datasets that the hospitals send to the InEK for accounting purposes. The InEK transmitts a legally exactly defined variable selection of these extensive structure and services data to the Federal Statistical Office.The DRG Statistic includes the continuous fully stationary treatment in the hospital (treatment chain) independent of the number of undergone specialist departments. Paramount are the information on operations and treatments, kind and amount of the invoiced charges (DRG case rates, effective valuation ratio, casemix) as well as main and secondary diagnosis. Besides, sociodemographic characteristics of the hospital cases such as age, sex and living region are recorded.
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes' data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo --- a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Montvale by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Montvale. The dataset can be utilized to understand the population distribution of Montvale by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Montvale. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Montvale.
Key observations
Largest age group (population): Male # 35-39 years (610) | Female # 45-49 years (437). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Montvale Population by Gender. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Refer to the current geographies boundaries table for a list of all current geographies and recent updates. This dataset is the definitive version of the annually released statistical area 1 (SA1) boundaries as at 1 January 2025, as defined by Stats NZ, clipped to the coastline. This clipped version has been created for cartographic purposes and so does not fully represent the official full extent boundaries. This clipped version contains 32,817 SA1s. SA1 is an output geography that allows the release of more low-level data than is available at the meshblock level. Built by joining meshblocks, SA1s have an ideal size range of 100–200 residents, and a maximum population of about 500. This is to minimise suppression of population data in multivariate statistics tables. The SA1 should: form a contiguous cluster of one or more meshblocks, be either urban, rural, or water in character, be small enough to: allow flexibility for aggregation to other statistical geographies, allow users to aggregate areas into their own defined communities of interest, form a nested hierarchy with statistical output geographies and administrative boundaries. It must: be built from meshblocks, either define or aggregate to define SA2s, urban rural areas, territorial authorities, and regional councils. SA1s generally have a population of 100–200 residents, with some exceptions: SA1s with nil or nominal resident populations are created to represent remote mainland areas, unpopulated islands, inland water, inlets, or oceanic areas. Some SA1s in remote rural areas and urban industrial or business areas have fewer than 100 residents. Some SA1s that contain apartment blocks, retirement villages, and large non-residential facilities (prisons, boarding schools, etc.) have more than 500 residents. SA1 numbering SA1s are not named. SA1 codes have seven digits starting with a 7 and are numbered approximately north to south. Non-digitised codes start with 79. As new SA1s are created, they are given the next available numeric code. If the composition of an SA1 changes through splitting or amalgamating different meshblocks, the SA1 is given a new code. The previous code no longer exists within that version and future versions of the SA1 classification. Digitised and non-digitised SA1s The digital geographic boundaries are defined and maintained by Stats NZ. Aggregated from meshblocks, SA1s cover the land area of New Zealand, the water area to the 12-mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, off-shore oil rigs, and Ross Dependency. The following 16 SA1s are held in non-digitised form. 7999901; New Zealand Economic Zone, 7999902; Oceanic Kermadec Islands,7999903; Kermadec Islands, 7999904; Oceanic Oil Rig Taranaki,7999905; Oceanic Campbell Island, 7999906; Campbell Island, 7999907; Oceanic Oil Rig Southland, 7999908; Oceanic Auckland Islands, 7999909; Auckland Islands, 7999910; Oceanic Bounty Islands, 7999911; Bounty Islands, 7999912; Oceanic Snares Islands, 7999913; Snares Islands, 7999914; Oceanic Antipodes Islands, 7999915; Antipodes Islands, 7999916; Ross Dependency. Clipped Version This clipped version has been created for cartographic purposes and so does not fully represent the official full extent boundaries. High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. Digital data Digital boundary data became freely available on 1 July 2007. Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz
As of March 2024, there were a reported 5,381 data centers in the United States, the most of any country worldwide. A further 521 were located in Germany, while 514 were located in the United Kingdom. What is a data center? A data center is a network of computing and storage resources that enables the delivery of shared software applications and data. These centers can house large amounts of critical and important data, and therefore are vital to the daily functions of companies and consumers alike. As a result, whether it is a cloud, colocation, or managed service, data center real estate will have increasing importance worldwide. Hyperscale data centers In the past, data centers were highly controlled physical infrastructures, but the cloud has since changed that model. A cloud data service is a remote version of a data center – located somewhere away from a company's physical premises. Cloud IT infrastructure spending has grown and is forecast to rise further in the coming years. The evolution of technology, along with the rapid growth in demand for data across the globe, is largely driven by the leading hyperscale data center providers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Refer to the current geographies boundaries table for a list of all current geographies and recent updates. This dataset is the definitive set of statistical area 1 (SA1) boundaries concorded to higher geographies as at 1 January 2025. This version contains 33,164 SA1s, including 16 with empty or null geometries (non-digitised SA1s). SA1 is an output geography that allows the release of more detailed information about population characteristics than is available at the meshblock level. Built by joining meshblocks, SA1s have an ideal size range of 100–200 residents, and a maximum population of about 500. This is to minimise suppression of population data in multivariate statistics tables. This SA1 higher geographies 2025 file is a correspondence, or concordance, which relates SA1s to larger geographic areas or 'higher geographies'. The higher geographies contained in this concordance are: statistical area 2 (SA22025), statistical area 3 (SA32025), urban rural (UR2025), and urban rural indicator (IUR2025), urban accessibility indicator (IUA), functional urban area (FUA), indicator functional urban area (IFUA) and functional urban area type (TFUA), territorial authority (TA2025), and regional council (REGC2025). The geography urban accessibility indicator (IUA) was first published in 2020 and added to this concordance in 2022. High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. Digital data Digital boundary data became freely available on 1 July 2007. Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. ERA5-Land has been produced by replaying the land component of the ECMWF ERA5 climate reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. Reanalysis produces data that goes several decades back in time, providing an accurate description of the climate of the past. ERA5-Land uses ERA5 atmospheric variables, such as air temperature and air humidity, as input to control the simulated land fields. This is called the atmospheric forcing. Without the constraint of the atmospheric forcing, the model-based estimates can rapidly deviate from reality. Therefore, while observations are not directly used in the production of ERA5-Land, they have an indirect influence through the atmospheric forcing used to run the simulation. In addition, the input air temperature, air humidity and pressure used to run ERA5-Land are corrected to account for the altitude difference between the grid of the forcing and the higher resolution grid of ERA5-Land. This correction is called 'lapse rate correction'. This catalogue entry provides post-processed ERA5-land hourly data aggregated to daily time steps. Note that the accumulated variables are omitted (e.g. total precipitation, runoff, etc - please refer to table 3 in the ERA5-Land online documentation for a full list of accumulated variables). In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:
The daily aggregation statistic (daily mean, daily max, daily min) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)
Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code and advice on how to return daily statistics for the accumulated variables, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5-land hourly data catalogue entry and the documentation found therein.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This service shows the percentage of population, excluding institutional residents, with knowledge of English and French for Canada by 2016 census division. The data is from the Census Profile, Statistics Canada Catalogue no. 98-316-X2016001. Knowledge of official languages refers to whether the person can conduct a conversation in English only, French only, in both languages or in neither language. For a child who has not yet learned to speak, this includes languages that the child is learning to speak at home. For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary. For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary. To have a cartographic representation of the ecumene with this socio-economic indicator, it is recommended to add as the first layer, the “NRCan - 2016 population ecumene by census division” web service, accessible in the data resources section below.
This statistic shows big data technology adoption plans in organizations worldwide from 2015 to 2019. Around 53 percent of respondents stated that their organization currently used big data technologies as of 2019.
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Dataset contains counts and measures for individuals from the 2013, 2018, and 2023 Censuses. Data is available by statistical area 1.
The variables included in this dataset are for the census usually resident population count (unless otherwise stated). All data is for level 1 of the classification (unless otherwise stated).
The variables for part 1 of the dataset are:
Download lookup file for part 1 from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.
Footnotes
Te Whata
Under the Mana Ōrite Relationship Agreement, Te Kāhui Raraunga (TKR) will be publishing Māori descent and iwi affiliation data from the 2023 Census in partnership with Stats NZ. This will be available on Te Whata, a TKR platform.
Geographical boundaries
Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.
Subnational census usually resident population
The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.
Population counts
Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts.
Caution using time series
Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).
Study participation time series
In the 2013 Census study participation was only collected for the census usually resident population count aged 15 years and over.
About the 2023 Census dataset
For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.
Data quality
The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.
Concept descriptions and quality ratings
Data quality ratings for 2023 Census variables has additional details about variables found within totals by topic, for example, definitions and data quality.
Disability indicator
This data should not be used as an official measure of disability prevalence. Disability prevalence estimates are only available from the 2023 Household Disability Survey. Household Disability Survey 2023: Final content has more information about the survey.
Activity limitations are measured using the Washington Group Short Set (WGSS). The WGSS asks about six basic activities that a person might have difficulty with: seeing, hearing, walking or climbing stairs, remembering or concentrating, washing all over or dressing, and communicating. A person was classified as disabled in the 2023 Census if there was at least one of these activities that they had a lot of difficulty with or could not do at all.
Using data for good
Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.
Confidentiality
The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.
Measures
Measures like averages, medians, and other quantiles are calculated from unrounded counts, with input noise added to or subtracted from each contributing value during measures calculations. Averages and medians based on less than six units (e.g. individuals, dwellings, households, families, or extended families) are suppressed. This suppression threshold changes for other quantiles. Where the cells have been suppressed, a placeholder value has been used.
Percentages
To calculate percentages, divide the figure for the category of interest by the figure for 'Total stated' where this applies.
Symbol
-997 Not available
-999 Confidential
Inconsistencies in definitions
Please note that there may be differences in definitions between census classifications and those used for other data collections.
The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.
The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).
The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.
A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.
Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.