31 datasets found
  1. E

    Health Statistic and Research Database

    • healthinformationportal.eu
    • www-acc.healthinformationportal.eu
    html
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Estonian National Institute for Health Development (2023). Health Statistic and Research Database [Dataset]. https://www.healthinformationportal.eu/health-information-sources/health-statistic-and-research-database
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Feb 23, 2023
    Dataset authored and provided by
    Estonian National Institute for Health Development
    Variables measured
    sex, title, topics, country, language, data_owners, description, contact_name, geo_coverage, contact_email, and 10 more
    Measurement technique
    Multiple sources
    Description

    The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.

    The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).

    The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.

    A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.

    Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.

  2. d

    Descriptive Statistic Information of Part of Speech (POS) Categories across...

    • catalogue.data.govt.nz
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Descriptive Statistic Information of Part of Speech (POS) Categories across Interactional Tasks - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-23640786
    Explore at:
    Dataset updated
    Jul 7, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The numbers represent the pooled data for each part-of-speech (POS) category across tasks and conditions, with the numbers in parentheses indicating the corresponding percentage (Blocks A = with a non-explicit goal to build with blocks; Blocks B = with an explicit goal to build a tower together; Blicket = with an explicit goal to teach the infants a novel noun; CV = co-view’ DG = digital-view

  3. d

    Data for generating statistical maps of soil lithium concentrations in the...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data for generating statistical maps of soil lithium concentrations in the conterminous United States [Dataset]. https://catalog.data.gov/dataset/data-for-generating-statistical-maps-of-soil-lithium-concentrations-in-the-conterminous-un
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Contiguous United States, United States
    Description

    The product data are six statistics that were estimated for the chemical concentration of lithium in the soil C horizon of the conterminous United States. The estimates are made at 9998 locations that are uniformly distributed across the conterminous United States. The six statistics are the mean for the isometric log-ratio transform of the concentrations, the equivalent mean for the concentrations, the standard deviation for the isometric log-ratio transform of the concentrations, the probability of exceeding a concentration of 55 milligrams per kilogram, the 0.95 quantile for the isometric log-ratio transform of the concentrations, and the equivalent 0.95 quantile for the concentrations. Each statistic may be used to generate a statistical map that shows an attribute of the distribution of lithium concentration.

  4. d

    Statistical Area 1 2025 - Dataset - data.govt.nz - discover and use data

    • catalogue.data.govt.nz
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Statistical Area 1 2025 - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/statistical-area-1-2025
    Explore at:
    Dataset updated
    Dec 9, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    This dataset is the definitive version of the annually released statistical area 1 (SA1) boundaries as at 1 January 2025, as defined by Stats NZ. This version contains 33,164 SA1s (33,148 digitised and 16 with empty or null geometries (non-digitised)). SA1 is an output geography that allows the release of more low-level data than is available at the meshblock level. Built by joining meshblocks, SA1s have an ideal size range of 100–200 residents, and a maximum population of about 500. This is to minimise suppression of population data in multivariate statistics tables. The SA1 should: form a contiguous cluster of one or more meshblocks, be either urban, rural, or water in character, be small enough to: allow flexibility for aggregation to other statistical geographies, allow users to aggregate areas into their own defined communities of interest, form a nested hierarchy with statistical output geographies and administrative boundaries. It must: be built from meshblocks, either define or aggregate to define SA2s, urban rural areas, territorial authorities, and regional councils. SA1s generally have a population of 100–200 residents, with some exceptions: SA1s with nil or nominal resident populations are created to represent remote mainland areas, unpopulated islands, inland water, inlets, or oceanic areas. Some SA1s in remote rural areas and urban industrial or business areas have fewer than 100 residents. Some SA1s that contain apartment blocks, retirement villages, and large non-residential facilities (prisons, boarding schools, etc.) have more than 500 residents. SA1 numbering SA1s are not named. SA1 codes have seven digits starting with a 7 and are numbered approximately north to south. Non-digitised codes start with 79. As new SA1s are created, they are given the next available numeric code. If the composition of an SA1 changes through splitting or amalgamating different meshblocks, the SA1 is given a new code. The previous code no longer exists within that version and future versions of the SA1 classification. Digitised and non-digitised SA1s The digital geographic boundaries are defined and maintained by Stats NZ. Aggregated from meshblocks, SA1s cover the land area of New Zealand, the water area to the 12-mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, off-shore oil rigs, and Ross Dependency. The following 16 SA1s are held in non-digitised form. 7999901; New Zealand Economic Zone, 7999902; Oceanic Kermadec Islands,7999903; Kermadec Islands, 7999904; Oceanic Oil Rig Taranaki,7999905; Oceanic Campbell Island, 7999906; Campbell Island, 7999907; Oceanic Oil Rig Southland, 7999908; Oceanic Auckland Islands, 7999909; Auckland Islands, 7999910; Oceanic Bounty Islands, 7999911; Bounty Islands, 7999912; Oceanic Snares Islands, 7999913; Snares Islands, 7999914; Oceanic Antipodes Islands, 7999915; Antipodes Islands, 7999916; Ross Dependency. ​ High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. ​ Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. ​ Digital data Digital boundary data became freely available on 1 July 2007. ​ Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz

  5. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.

  6. g

    ATO Statistic Data - City of Greater Geelong

    • gimi9.com
    • researchdata.edu.au
    • +3more
    Updated Jul 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). ATO Statistic Data - City of Greater Geelong [Dataset]. https://gimi9.com/dataset/au_ato-statistic-data-city-of-greater-geelong
    Explore at:
    Dataset updated
    Jul 28, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Greater Geelong City
    Description

    The de-identified data from the 2013-14 individual 2% sample file (https://data.gov.au/dataset/taxation-statistics-individual-sample-files) has been aggregated to the following levels: Sex Age (5 year ranges) Occupation (1 digit level) Partner Status Location (SA4 Region name) Lodgment channel (Agent or self-preparer) PHI indicator. Data was then added from the ABS Census (2011), and ABS SEIFA, in summary variables, or ranked variables to SA4 regions. The Geelong Region was extracted from the full dataset Although all due care has been taken to ensure that these data are correct, no warranty is expressed or implied by the City of Greater Geelong in their use.

  7. H

    Replication Data for: A Practical Method to Reduce Privacy Loss when...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raj Chetty; John Friedman (2022). Replication Data for: A Practical Method to Reduce Privacy Loss when Disclosing Statistics Based on Small Samples [Dataset]. http://doi.org/10.7910/DVN/RCHDXX
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Raj Chetty; John Friedman
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/RCHDXXhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/RCHDXX

    Description

    This dataset contains replication files for "A Practical Method to Reduce Privacy Loss when Disclosing Statistics Based on Small Samples" by Raj Chetty and John Friedman. For more information, see https://opportunityinsights.org/paper/differential-privacy/. A summary of the related publication follows. Releasing statistics based on small samples – such as estimates of social mobility by Census tract, as in the Opportunity Atlas – is very valuable for policy but can potentially create privacy risks by unintentionally disclosing information about specific individuals. To mitigate such risks, we worked with researchers at the Harvard Privacy Tools Project and Census Bureau staff to develop practical methods of reducing the risks of privacy loss when releasing such data. This paper describes the methods that we developed, which can be applied to disclose any statistic of interest that is estimated using a sample with a small number of observations. We focus on the case where the dataset can be broken into many groups (“cells”) and one is interested in releasing statistics for one or more of these cells. Building on ideas from the differential privacy literature, we add noise to the statistic of interest in proportion to the statistic’s maximum observed sensitivity, defined as the maximum change in the statistic from adding or removing a single observation across all the cells in the data. Intuitively, our approach permits the release of statistics in arbitrarily small samples by adding sufficient noise to the estimates to protect privacy. Although our method does not offer a formal privacy guarantee, it generally outperforms widely used methods of disclosure limitation such as count-based cell suppression both in terms of privacy loss and statistical bias. We illustrate how the method can be implemented by discussing how it was used to release estimates of social mobility by Census tract in the Opportunity Atlas. We also provide a step-by-step guide and illustrative Stata code to implement our approach.

  8. d

    Statistical Area 3 Higher Geographies 2025 - Dataset - data.govt.nz -...

    • catalogue.data.govt.nz
    Updated Dec 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Statistical Area 3 Higher Geographies 2025 - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/statistical-area-3-higher-geographies-2025
    Explore at:
    Dataset updated
    Dec 2, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    Refer to the current geographies boundaries table for a list of all current geographies and recent updates. This dataset is the definitive version of the annually released statistical area 3 boundaries as at 1 January 2025, defined by Stats NZ and concorded to higher geographies. This version contains 929 statistical 3 areas (925 digitised and 4 with empty or null geometries (non-digitised)). Statistical area 3 (SA3) is a new output geography, introduced in 2023, that allows aggregations of population data between the SA3geography and territorial authority geography. This dataset is the definitive version of statistical area 3 (SA3) boundaries concorded to higher geographies for 2025 as defined by Stats NZ. This version contains 929 SA3s. This statistical area 3 higher geographies file is a correspondence, or concordance, which relates SA3s to larger geographic areas or 'higher geographies'. The higher geography contained in this concordance is: territorial authority (TA). ​ High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. ​ Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. ​ Digital data Digital boundary data became freely available on 1 July 2007. ​ Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz

  9. ERA5 post-processed daily statistics on single levels from 1940 to present

    • cds.climate.copernicus.eu
    grib
    Updated Mar 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 post-processed daily statistics on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.4991cf48
    Explore at:
    gribAvailable download formats
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf

    Time period covered
    Jan 1, 1940 - Mar 20, 2025
    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:

    The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

    *The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.

  10. d

    Diagnosis-Related Groups Statistic (DRG-Statistic) 2010, On-Site, Version 0...

    • b2find.dkrz.de
    Updated Nov 16, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Diagnosis-Related Groups Statistic (DRG-Statistic) 2010, On-Site, Version 0 - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/38109451-995a-5dbf-9c8d-69d0ee62a480
    Explore at:
    Dataset updated
    Nov 16, 2018
    Description

    The Diagnosis-Related Groups Statistic (DRG) is an anual complete survey of all fully stationary hospital cases in Germany that were accounted for by case rates. The microdata can be requested via the RDC starting from the survey year 2005.The case rate based DRG accounting system was introduced when the hospital financing was amended in 2000. Starting from 1 January 2004, this accounting system is obligatory for general hospitals.The DRG Statistic is a secundary statistic. The data is not collected by the statistical offices of the Federation and the federal states but by the Institut für das Entgeltsystem im Krankenhaus (InEK). The data for the DRG Statistic are taken from the datasets that the hospitals send to the InEK for accounting purposes. The InEK transmitts a legally exactly defined variable selection of these extensive structure and services data to the Federal Statistical Office.The DRG Statistic includes the continuous fully stationary treatment in the hospital (treatment chain) independent of the number of undergone specialist departments. Paramount are the information on operations and treatments, kind and amount of the invoiced charges (DRG case rates, effective valuation ratio, casemix) as well as main and secondary diagnosis. Besides, sociodemographic characteristics of the hospital cases such as age, sex and living region are recorded.

  11. d

    Data from: Distributed Monitoring of the R2 Statistic for Linear Regression

    • catalog.data.gov
    • data.nasa.gov
    • +1more
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Distributed Monitoring of the R2 Statistic for Linear Regression [Dataset]. https://catalog.data.gov/dataset/distributed-monitoring-of-the-r2-statistic-for-linear-regression
    Explore at:
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Dashlink
    Description

    The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes' data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo --- a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.

  12. N

    Montvale, NJ Population Breakdown by Gender and Age Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Montvale, NJ Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/8e269bd6-c989-11ee-9145-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montvale, New Jersey
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Montvale by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Montvale. The dataset can be utilized to understand the population distribution of Montvale by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Montvale. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Montvale.

    Key observations

    Largest age group (population): Male # 35-39 years (610) | Female # 45-49 years (437). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the Montvale population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Montvale is shown in the following column.
    • Population (Female): The female population in the Montvale is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Montvale for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Montvale Population by Gender. You can refer the same here

  13. l

    LScDC Word-Category RIG Matrix

    • figshare.le.ac.uk
    pdf
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LScDC Word-Category RIG Matrix [Dataset]. https://figshare.le.ac.uk/articles/dataset/LScDC_Word-Category_RIG_Matrix/12133431
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.

  14. d

    Statistical Area 1 2025 Clipped - Dataset - data.govt.nz - discover and use...

    • catalogue.data.govt.nz
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Statistical Area 1 2025 Clipped - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/statistical-area-1-2025-clipped
    Explore at:
    Dataset updated
    Dec 9, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    Refer to the current geographies boundaries table for a list of all current geographies and recent updates. This dataset is the definitive version of the annually released statistical area 1 (SA1) boundaries as at 1 January 2025, as defined by Stats NZ, clipped to the coastline. This clipped version has been created for cartographic purposes and so does not fully represent the official full extent boundaries. This clipped version contains 32,817 SA1s. SA1 is an output geography that allows the release of more low-level data than is available at the meshblock level. Built by joining meshblocks, SA1s have an ideal size range of 100–200 residents, and a maximum population of about 500. This is to minimise suppression of population data in multivariate statistics tables. The SA1 should: form a contiguous cluster of one or more meshblocks, be either urban, rural, or water in character, be small enough to: allow flexibility for aggregation to other statistical geographies, allow users to aggregate areas into their own defined communities of interest, form a nested hierarchy with statistical output geographies and administrative boundaries. It must: be built from meshblocks, either define or aggregate to define SA2s, urban rural areas, territorial authorities, and regional councils. SA1s generally have a population of 100–200 residents, with some exceptions: SA1s with nil or nominal resident populations are created to represent remote mainland areas, unpopulated islands, inland water, inlets, or oceanic areas. Some SA1s in remote rural areas and urban industrial or business areas have fewer than 100 residents. Some SA1s that contain apartment blocks, retirement villages, and large non-residential facilities (prisons, boarding schools, etc.) have more than 500 residents. SA1 numbering SA1s are not named. SA1 codes have seven digits starting with a 7 and are numbered approximately north to south. Non-digitised codes start with 79. As new SA1s are created, they are given the next available numeric code. If the composition of an SA1 changes through splitting or amalgamating different meshblocks, the SA1 is given a new code. The previous code no longer exists within that version and future versions of the SA1 classification. Digitised and non-digitised SA1s The digital geographic boundaries are defined and maintained by Stats NZ. Aggregated from meshblocks, SA1s cover the land area of New Zealand, the water area to the 12-mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, off-shore oil rigs, and Ross Dependency. The following 16 SA1s are held in non-digitised form. 7999901; New Zealand Economic Zone, 7999902; Oceanic Kermadec Islands,7999903; Kermadec Islands, 7999904; Oceanic Oil Rig Taranaki,7999905; Oceanic Campbell Island, 7999906; Campbell Island, 7999907; Oceanic Oil Rig Southland, 7999908; Oceanic Auckland Islands, 7999909; Auckland Islands, 7999910; Oceanic Bounty Islands, 7999911; Bounty Islands, 7999912; Oceanic Snares Islands, 7999913; Snares Islands, 7999914; Oceanic Antipodes Islands, 7999915; Antipodes Islands, 7999916; Ross Dependency. ​ Clipped Version This clipped version has been created for cartographic purposes and so does not fully represent the official full extent boundaries. ​ High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. ​ Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. ​ Digital data Digital boundary data became freely available on 1 July 2007. ​ Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz

  15. Leading countries by number of data centers 2024

    • statista.com
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petroc Taylor (2024). Leading countries by number of data centers 2024 [Dataset]. https://www.statista.com/topics/1464/big-data/
    Explore at:
    Dataset updated
    Mar 19, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Petroc Taylor
    Description

    As of March 2024, there were a reported 5,381 data centers in the United States, the most of any country worldwide. A further 521 were located in Germany, while 514 were located in the United Kingdom. What is a data center? A data center is a network of computing and storage resources that enables the delivery of shared software applications and data. These centers can house large amounts of critical and important data, and therefore are vital to the daily functions of companies and consumers alike. As a result, whether it is a cloud, colocation, or managed service, data center real estate will have increasing importance worldwide. Hyperscale data centers In the past, data centers were highly controlled physical infrastructures, but the cloud has since changed that model. A cloud data service is a remote version of a data center – located somewhere away from a company's physical premises. Cloud IT infrastructure spending has grown and is forecast to rise further in the coming years. The evolution of technology, along with the rapid growth in demand for data across the globe, is largely driven by the leading hyperscale data center providers.

  16. d

    Statistical Area 1 Higher Geographies 2025 - Dataset - data.govt.nz -...

    • catalogue.data.govt.nz
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Statistical Area 1 Higher Geographies 2025 - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/statistical-area-1-higher-geographies-2025
    Explore at:
    Dataset updated
    Dec 9, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Refer to the current geographies boundaries table for a list of all current geographies and recent updates. This dataset is the definitive set of statistical area 1 (SA1) boundaries concorded to higher geographies as at 1 January 2025. This version contains 33,164 SA1s, including 16 with empty or null geometries (non-digitised SA1s). SA1 is an output geography that allows the release of more detailed information about population characteristics than is available at the meshblock level. Built by joining meshblocks, SA1s have an ideal size range of 100–200 residents, and a maximum population of about 500. This is to minimise suppression of population data in multivariate statistics tables. This SA1 higher geographies 2025 file is a correspondence, or concordance, which relates SA1s to larger geographic areas or 'higher geographies'. The higher geographies contained in this concordance are: statistical area 2 (SA22025), statistical area 3 (SA32025), urban rural (UR2025), and urban rural indicator (IUR2025), urban accessibility indicator (IUA), functional urban area (FUA), indicator functional urban area (IFUA) and functional urban area type (TFUA), territorial authority (TA2025), and regional council (REGC2025). The geography urban accessibility indicator (IUA) was first published in 2020 and added to this concordance in 2022. ​ High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. ​ Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. Digital data Digital boundary data became freely available on 1 July 2007. ​ Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz

  17. ERA5-Land post-processed daily statistics from 1950 to present

    • cds.climate.copernicus.eu
    {grib,netcdf}
    Updated Mar 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5-Land post-processed daily statistics from 1950 to present [Dataset]. http://doi.org/10.24381/cds.e9c9c792
    Explore at:
    {grib,netcdf}Available download formats
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf

    Time period covered
    Jan 1, 1950 - Mar 20, 2025
    Description

    ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. ERA5-Land has been produced by replaying the land component of the ECMWF ERA5 climate reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. Reanalysis produces data that goes several decades back in time, providing an accurate description of the climate of the past. ERA5-Land uses ERA5 atmospheric variables, such as air temperature and air humidity, as input to control the simulated land fields. This is called the atmospheric forcing. Without the constraint of the atmospheric forcing, the model-based estimates can rapidly deviate from reality. Therefore, while observations are not directly used in the production of ERA5-Land, they have an indirect influence through the atmospheric forcing used to run the simulation. In addition, the input air temperature, air humidity and pressure used to run ERA5-Land are corrected to account for the altitude difference between the grid of the forcing and the higher resolution grid of ERA5-Land. This correction is called 'lapse rate correction'. This catalogue entry provides post-processed ERA5-land hourly data aggregated to daily time steps. Note that the accumulated variables are omitted (e.g. total precipitation, runoff, etc - please refer to table 3 in the ERA5-Land online documentation for a full list of accumulated variables). In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:

    The daily aggregation statistic (daily mean, daily max, daily min) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

    Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code and advice on how to return daily statistics for the accumulated variables, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5-land hourly data catalogue entry and the documentation found therein.

  18. Percentage of population with knowledge of English and French by census...

    • open.canada.ca
    • datasets.ai
    • +2more
    esri rest, fgdb/gdb +3
    Updated Jan 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2022). Percentage of population with knowledge of English and French by census division, 2016 [Dataset]. https://open.canada.ca/data/en/dataset/7043f8c1-d5e5-492f-8bb1-7eeac9f2a74f
    Explore at:
    html, mxd, wms, esri rest, fgdb/gdbAvailable download formats
    Dataset updated
    Jan 31, 2022
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 2016
    Area covered
    French
    Description

    This service shows the percentage of population, excluding institutional residents, with knowledge of English and French for Canada by 2016 census division. The data is from the Census Profile, Statistics Canada Catalogue no. 98-316-X2016001. Knowledge of official languages refers to whether the person can conduct a conversation in English only, French only, in both languages or in neither language. For a child who has not yet learned to speak, this includes languages that the child is learning to speak at home. For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary. For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary. To have a cartographic representation of the ecumene with this socio-economic indicator, it is recommended to add as the first layer, the “NRCan - 2016 population ecumene by census division” web service, accessible in the data resources section below.

  19. Adoption status of big data technology in organizations worldwide 2015-2019

    • statista.com
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Adoption status of big data technology in organizations worldwide 2015-2019 [Dataset]. https://www.statista.com/statistics/919670/worldwide-big-data-adoption-expectations/
    Explore at:
    Dataset updated
    Dec 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    This statistic shows big data technology adoption plans in organizations worldwide from 2015 to 2019. Around 53 percent of respondents stated that their organization currently used big data technologies as of 2019.

  20. S

    2023 Census totals by topic for individuals by statistical area 1 – part 1

    • datafinder.stats.govt.nz
    csv, dwg, geodatabase +6
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2024). 2023 Census totals by topic for individuals by statistical area 1 – part 1 [Dataset]. https://datafinder.stats.govt.nz/layer/120766-2023-census-totals-by-topic-for-individuals-by-statistical-area-1-part-1/
    Explore at:
    geodatabase, dwg, mapinfo mif, shapefile, csv, kml, geopackage / sqlite, pdf, mapinfo tabAvailable download formats
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Area covered
    Oceania, Te Ika-a-Māui / North Island
    Description

    Dataset contains counts and measures for individuals from the 2013, 2018, and 2023 Censuses. Data is available by statistical area 1.

    The variables included in this dataset are for the census usually resident population count (unless otherwise stated). All data is for level 1 of the classification (unless otherwise stated).

    The variables for part 1 of the dataset are:

    • Census usually resident population count
    • Census night population count
    • Age (5-year groups)
    • Age (life cycle groups)
    • Median age
    • Birthplace (NZ born/overseas born)
    • Birthplace (broad geographic areas)
    • Ethnicity (total responses) for level 1 and ‘Other Ethnicity’ grouped by ‘New Zealander’ and ‘Other Ethnicity nec’
    • Māori descent indicator
    • Languages spoken (total responses)
    • Official language indicator
    • Gender
    • Sex at birth
    • Rainbow/LGBTIQ+ indicator for the census usually resident population count aged 15 years and over
    • Sexual identity for the census usually resident population count aged 15 years and over
    • Legally registered relationship status for the census usually resident population count aged 15 years and over
    • Partnership status in current relationship for the census usually resident population count aged 15 years and over
    • Number of children born for the sex at birth female census usually resident population count aged 15 years and over
    • Average number of children born for the sex at birth female census usually resident population count aged 15 years and over
    • Religious affiliation (total responses)
    • Cigarette smoking behaviour for the census usually resident population count aged 15 years and over
    • Disability indicator for the census usually resident population count aged 5 years and over
    • Difficulty communicating for the census usually resident population count aged 5 years and over
    • Difficulty hearing for the census usually resident population count aged 5 years and over
    • Difficulty remembering or concentrating for the census usually resident population count aged 5 years and over
    • Difficulty seeing for the census usually resident population count aged 5 years and over
    • Difficulty walking for the census usually resident population count aged 5 years and over
    • Difficulty washing for the census usually resident population count aged 5 years and over.

    Download lookup file for part 1 from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.

    Footnotes

    Te Whata

    Under the Mana Ōrite Relationship Agreement, Te Kāhui Raraunga (TKR) will be publishing Māori descent and iwi affiliation data from the 2023 Census in partnership with Stats NZ. This will be available on Te Whata, a TKR platform.

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.

    Population counts

    Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts.

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).

    Study participation time series

    In the 2013 Census study participation was only collected for the census usually resident population count aged 15 years and over.

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Concept descriptions and quality ratings

    Data quality ratings for 2023 Census variables has additional details about variables found within totals by topic, for example, definitions and data quality.

    Disability indicator

    This data should not be used as an official measure of disability prevalence. Disability prevalence estimates are only available from the 2023 Household Disability Survey. Household Disability Survey 2023: Final content has more information about the survey.

    Activity limitations are measured using the Washington Group Short Set (WGSS). The WGSS asks about six basic activities that a person might have difficulty with: seeing, hearing, walking or climbing stairs, remembering or concentrating, washing all over or dressing, and communicating. A person was classified as disabled in the 2023 Census if there was at least one of these activities that they had a lot of difficulty with or could not do at all.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Measures

    Measures like averages, medians, and other quantiles are calculated from unrounded counts, with input noise added to or subtracted from each contributing value during measures calculations. Averages and medians based on less than six units (e.g. individuals, dwellings, households, families, or extended families) are suppressed. This suppression threshold changes for other quantiles. Where the cells have been suppressed, a placeholder value has been used.

    Percentages

    To calculate percentages, divide the figure for the category of interest by the figure for 'Total stated' where this applies.

    Symbol

    -997 Not available

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Estonian National Institute for Health Development (2023). Health Statistic and Research Database [Dataset]. https://www.healthinformationportal.eu/health-information-sources/health-statistic-and-research-database

Health Statistic and Research Database

Explore at:
htmlAvailable download formats
Dataset updated
Feb 23, 2023
Dataset authored and provided by
Estonian National Institute for Health Development
Variables measured
sex, title, topics, country, language, data_owners, description, contact_name, geo_coverage, contact_email, and 10 more
Measurement technique
Multiple sources
Description

The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.

The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).

The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.

A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.

Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.

Search
Clear search
Close search
Google apps
Main menu