Data Dictionary for the Real Property CAMA information attached to parcel datasets.Supplemental information regarding the data values can be found here: https://myplace.cuyahogacounty.us/FieldDefinitions.html
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
These are peer-reviewed supplementary materials for the article 'Real-world evidence: state-of-the-art and future perspectives' published in the Journal of Comparative Effectiveness Research.BackgroundAimMethodsStep 1: Selection of TAsStep 2a: Cross-validation of definition of ‘use of routine data in non-experimental settings’ Figure 3: Refinement of the criteria used to define 'use of routine data in non-experimental settings’ for the full assessment of published NICE TAsStep 2b: Full review of 12 Cancer and 67 Non-Cancer TAs published 2022-24ResultsFigure 4: Selection of TAs for reviewFigure 5: Distribution of Cancer (blue) and Non-Cancer (green) TAs submitted to NICE since 2000 (A). Non-Cancer TAs are broken down by specialty (B)Table 1: Results of the cross-validation of the criteria applied to randomly selected Cancer TAsTable 2: Results of the cross-validation of the criteria applied to randomly selected Non-Cancer TAsRecent developments in digital infrastructure, advanced analytical approaches, and regulatory settings have facilitated the broadened use of real-world evidence (RWE) in population health management and evaluation of novel health technologies. RWE has uniquely contributed to improving human health by addressing unmet clinical needs, from assessing the external validity of clinical trial data to discovery of new disease phenotypes. In this perspective, we present exemplars across various health areas that have been impacted by real-world data and RWE, and we provide insights into further opportunities afforded by RWE. By deploying robust methodologies and transparently reporting caveats and limitations, realworld data accessed via secure data environments can support proactive healthcare management and accelerate access to novel interventions in England.
Excel Spreadsheet Data Dictionary for Abatements and TIFs.For more information, please visit Cuyahoga County's Fiscal Hub Incentive Information Site.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.
Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.
Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.
Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.
Methods eLAB Development and Source Code (R statistical software):
eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).
eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.
Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.
The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).
Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.
Data Dictionary (DD)
EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.
Study Cohort
This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.
Statistical Analysis
OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In daily life, two common algorithms are used for collecting medical disease data: data integration of medical institutions and questionnaires. However, these statistical methods require collecting data from the entire research area, which consumes a significant amount of manpower and material resources. Additionally, data integration is difficult and poses privacy protection challenges, resulting in a large number of missing data in the dataset. The presence of incomplete data significantly reduces the quality of the published data, hindering the timely analysis of data and the generation of reliable knowledge by epidemiologists, public health authorities, and researchers. Consequently, this affects the downstream tasks that rely on this data. To address the issue of discrete missing data in cardiac disease, this paper proposes the AGAN (Attribute Generative Adversarial Nets) architecture for missing data filling, based on generative adversarial networks. This algorithm takes advantage of the strong learning ability of generative adversarial networks. Given the ambiguous meaning of filling data in other network structures, the attribute matrix is designed to directly convert it into the corresponding data type, making the actual meaning of the filling data more evident. Furthermore, the distribution deviation between the generated data and the real data is integrated into the loss function of the generative adversarial networks, improving their training stability and ensuring consistency between the generated data and the real data distribution. This approach establishes the missing data filling mechanism based on the generative adversarial networks, which ensures the rationality of the data distribution while filling the missing data samples. The experimental results demonstrate that compared to other filling algorithms, the data matrix filled by the proposed algorithm in this paper has more evident practical significance, fewer errors, and higher accuracy in downstream classification prediction.
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Update Frequency: Monthly
The data is sorted by tax account number by levy year. This allows multiple, delinquent levy year tax accounts for a single parcel to be listed contiguously. The full payment amount due on a delinquent real estate tax account will always include accrued tax interest and penalty charges, but may also include accrued judgment interest where a judgment has been taken. You may access the Current Tax Balance – Tax Search on our Web Site, or call the Customer Services Division at 414-286-2240 for the current full payment amount due.
70.03 Definition of real property. (1) In chs. 70 to 76, 78,and 79, “real property,” “real estate,” and “land” include not only the land itself but all buildings and improvements thereon, and all fixtures and rights and privileges appertaining thereto, except as provided in sub. (2) and except that for the purpose of time−share property, as defined in s. 707.02 (32), real property does not include recurrent exclusive use and occupancy on a periodic basis or other rights, including, but not limited to, membership rights, vacation services, and club memberships. (2) “Real property” and “real estate” do not include any permit or license required to place, operate, or maintain at a specific location one or more articles of personal property described under s. 70.04 (3) or any value associated with the permit or license.
VITAL SIGNS INDICATOR List Rents (EC9)
FULL MEASURE NAME List Rents
LAST UPDATED October 2016
DESCRIPTION List rent refers to the advertised rents for available rental housing and serves as a measure of housing costs for new households moving into a neighborhood, city, county or region.
DATA SOURCE real Answers (1994 – 2015) no link
Zillow Metro Median Listing Price All Homes (2010-2016) http://www.zillow.com/research/data/
CONTACT INFORMATION vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator) List rents data reflects median rent prices advertised for available apartments rather than median rent payments; more information is available in the indicator definition above. Regional and local geographies rely on data collected by real Answers, a research organization and database publisher specializing in the multifamily housing market. real Answers focuses on collecting longitudinal data for individual rental properties through quarterly surveys. For the Bay Area, their database is comprised of properties with 40 to 3,000+ housing units. Median list prices most likely have an upward bias due to the exclusion of smaller properties. The bias may be most extreme in geographies where large rental properties represent a small portion of the overall rental market. A map of the individual properties surveyed is included in the Local Focus section.
Individual properties surveyed provided lower- and upper-bound ranges for the various types of housing available (studio, 1 bedroom, 2 bedroom, etc.). Median lower- and upper-bound prices are determined across all housing types for the regional and county geographies. The median list price represented in Vital Signs is the average of the median lower- and upper-bound prices for the region and counties. Median upper-bound prices are determined across all housing types for the city geographies. The median list price represented in Vital Signs is the median upper-bound price for cities. For simplicity, only the mean list rent is displayed for the individual properties. The metro areas geography rely upon Zillow data, which is the median price for rentals listed through www.zillow.com during the month. Like the real Answers data, Zillow's median list prices most likely have an upward bias since small properties are underrepresented in Zillow's listings. The metro area data for the Bay Area cannot be compared to the regional Bay Area data. Due to afore mentioned data limitations, this data is suitable for analyzing the change in list rents over time but not necessarily comparisons of absolute list rents. Metro area boundaries reflects today’s metro area definitions by county for consistency, rather than historical metro area boundaries.
Due to the limited number of rental properties surveyed, city-level data is unavailable for Atherton, Belvedere, Brisbane, Calistoga, Clayton, Cloverdale, Cotati, Fairfax, Half Moon Bay, Healdsburg, Hillsborough, Los Altos Hills, Monte Sereno, Moranga, Oakley, Orinda, Portola Valley, Rio Vista, Ross, San Anselmo, San Carlos, Saratoga, Sebastopol, Windsor, Woodside, and Yountville.
Inflation-adjusted data are presented to illustrate how rents have grown relative to overall price increases; that said, the use of the Consumer Price Index does create some challenges given the fact that housing represents a major chunk of consumer goods bundle used to calculate CPI. This reflects a methodological tradeoff between precision and accuracy and is a common concern when working with any commodity that is a major component of CPI itself. Percent change in inflation-adjusted median is calculated with respect to the median price from the fourth quarter or December of the base year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In daily life, two common algorithms are used for collecting medical disease data: data integration of medical institutions and questionnaires. However, these statistical methods require collecting data from the entire research area, which consumes a significant amount of manpower and material resources. Additionally, data integration is difficult and poses privacy protection challenges, resulting in a large number of missing data in the dataset. The presence of incomplete data significantly reduces the quality of the published data, hindering the timely analysis of data and the generation of reliable knowledge by epidemiologists, public health authorities, and researchers. Consequently, this affects the downstream tasks that rely on this data. To address the issue of discrete missing data in cardiac disease, this paper proposes the AGAN (Attribute Generative Adversarial Nets) architecture for missing data filling, based on generative adversarial networks. This algorithm takes advantage of the strong learning ability of generative adversarial networks. Given the ambiguous meaning of filling data in other network structures, the attribute matrix is designed to directly convert it into the corresponding data type, making the actual meaning of the filling data more evident. Furthermore, the distribution deviation between the generated data and the real data is integrated into the loss function of the generative adversarial networks, improving their training stability and ensuring consistency between the generated data and the real data distribution. This approach establishes the missing data filling mechanism based on the generative adversarial networks, which ensures the rationality of the data distribution while filling the missing data samples. The experimental results demonstrate that compared to other filling algorithms, the data matrix filled by the proposed algorithm in this paper has more evident practical significance, fewer errors, and higher accuracy in downstream classification prediction.
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Dataset contains counts for territorial authority local board area (TALB) of usual residence by TALB of usual residence address one year ago and five years ago, and by life cycle age group, for the census usually resident population count, 2023 Census.
This dataset compares usual residence at the 2023 Census with usual residence one and five years earlier to show population mobility and internal migration patterns of people within New Zealand.
‘Usual residence address’ is the address of the dwelling where a person considers that they usually live.
‘Usual residence one year ago address’ identifies an individual’s usual residence on 7 March 2022, which may be different to their current usual residence on census night 2023 (7 March 2023).
‘Usual residence five years ago address’ identifies an individual’s usual residence on 6 March 2018, which may be different to their current usual residence on census night 2023 (7 March 2023).
Note: This dataset only includes usual residence address information for individuals whose usual residence address one year ago and five years ago is available at TALB.
Life cycle age groups are categorised as:
This dataset can be used in conjunction with the following spatial files by joining on the TALB code values:
Footnotes
Geographical boundaries
Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.
Subnational census usually resident population
The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.
Population counts
Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts.
Rows excluded from the dataset
Rows show TALB of usual residence by TALB of usual residence one year ago and five years ago, by life cycle age group. Cells with a number less than six have been confidentialised. Responses to categories unable to be mapped, such as response unidentifiable, not stated, and Auckland (not further defined), have also been excluded from this dataset.
About the 2023 Census dataset
For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.
Data quality
The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.
Quality rating of a variable
The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.
Age quality rating
Age is rated as very high quality.
Age – 2023 Census: Information by concept has more information, for example, definitions and data quality.
Census usually resident population quality rating
The census usually resident population count is rated as very high quality.
Census usually resident population count – 2023 Census: Information by concept has more information, for example, definitions and data quality.
Usual residence address quality rating
Usual residence address is rated as high quality.
Usual residence address – 2023 Census: Information by concept has more information, for example, definitions and data quality.
Usual residence one year ago quality rating
Usual residence one year ago area is rated as high quality.
Usual residence one year ago – 2023 Census: Information by concept has more information, for example, definitions and data quality.
Usual residence five years ago quality rating
Usual residence five years ago area is rated as high quality.
Usual residence five years ago – 2023 Census: Information by concept has more information, for example, definitions and data quality.
Using data for good
Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.
Confidentiality
The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.
Symbol
-999 Confidential
Inconsistencies in definitions
Please note that there may be differences in definitions between census classifications and those used for other data collections.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456291https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456291
Abstract (en): This survey is the first broad-based, systematic examination of the nature of civil litigation in state general jurisdiction trial courts. Data collection was carried out by the National Center for State Courts with assistance from the National Association of Criminal Justice Planners and the United States Bureau of the Census. The data collection produced two datasets. Part 1, Tort, Contract, and Real Property Rights Data, is a merged sample of approximately 30,000 tort, contract, and real property rights cases disposed during the 12-month period ending June 30, 1992. Part 2, Civil Jury Cases Data, is a sample of about 6,500 jury trial cases disposed over the same time period. Data collected include information about litigants, case type, disposition type, processing time, case outcome, and award amounts for civil jury cases. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. Forty-five jurisdictions chosen to represent the 75 most populous counties in the nation. The sample for this study was designed and selected by the United States Bureau of the Census. It was a two-stage stratified sample with 45 of the 75 most populous counties selected at the first stage. The top 75 counties account for about 37 percent of the United States population and about half of all civil filings. The 75 counties were divided into four strata based on aggregate civil disposition data for 1990 obtained through telephone interviews with court staffs in the general jurisdiction trial courts. The sample consisted of tort, contract, and real property rights cases disposed between July 1, 1991, and June 30, 1992. 2011-11-02 All parts are being moved to restricted access and will be available only using the restricted access procedures.2006-03-30 File CB6587.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.2006-03-30 File CB6587.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.2006-03-30 File CB6587.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.2006-03-30 File CB6587.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.2006-03-30 File CB6587.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.2005-11-04 On 2005-03-14 new files were added to one or more datasets. These files included additional setup files as well as one or more of the following: SAS program, SAS transport, SPSS portable, and Stata system files. The metadata record was revised 2005-11-04 to reflect these additions.2004-06-01 The data have been updated by the principal investigator to include replicate weights and a few other variables. The codebook and SAS and SPSS data definition statements have been revised to reflect these changes.2001-03-26 The data have been updated by the principal investigator to include replicate weights. The codebook and SAS and SPSS data definition statements have been revised to reflect these changes.2001-03-26 The data had been updated by the principal investigator to include replicate weights. The codebook and SAS and SPSS data definition statements had been revised to reflect these changes.1997-07-29 The codebook had been revised to correct errors documenting both data files. Column location (and width) of variable WGHT "TOTAL WEIGHT" was incorrectly shown as 10.4 for Part 1, Tort, Contract, and Real Property Data. It was accurately shown in the data definition statements as 9.4. Variables listed after WGHT were inaccurately reported one column off in the codebook. Similarly, column location (and width) of variable WGHT "TOTAL WEIGHT" was incorrectly shown as 10.2 for Part 2, Civil Jury Data. It was accurately shown in the data definition statements as 9.2. Variables listed after WGHT were inaccurately reported one column off in the codebook. Fundi...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the data Real Gross Domestic Product - Components - Current Local Curreny Unit (CLU) for the country Hong Kong SAR, China. Indicator Definition:Real Private Sector Final Consumption Expenditure, Unadjusted, Domestic CurrencyIndicator Definition:Real General Government Final Consumption Expenditure, Unadjusted, Domestic CurrencyIndicator Definition:Real Gross Fixed Capital Formation, Unadjusted, Domestic CurrencyIndicator Definition:Real Changes in Inventories, Unadjusted, Domestic CurrencyIndicator Definition:Net Trade is defined as exports minus imports (measured in local currency units (LCU)).
A. SUMMARY These data represent hate crimes reported by the SFPD to the California Department of Justice. Read the detailed overview of this dataset here. What is a Hate Crime? A hate crime is a crime against a person, group, or property motivated by the victim's real or perceived protected social group. An individual may be the victim of a hate crime if they have been targeted because of their actual or perceived: (1) disability, (2) gender, (3) nationality, (4) race or ethnicity, (5) religion, (6) sexual orientation, and/or (7) association with a person or group with one or more of these actual or perceived characteristics. Hate crimes are serious crimes that may result in imprisonment or jail time. B. HOW THE DATASET IS CREATED How is a Hate Crime Processed? Not all prejudice incidents including the utterance of hate speech rise to the level of a hate crime. The U.S. Constitution allows hate speech if it does not interfere with the civil rights of others. While these acts are certainly hurtful, they do not rise to the level of criminal violations and thus may not be prosecuted. When a prejudice incident is reported, the reporting officer conducts a preliminary investigation and writes a crime or incident report. Bigotry must be the central motivation for an incident to be determined to be a hate crime. In that report, all facts such as verbatims or statements that occurred before or after the incident and characteristics such as the race, ethnicity, sex, religion, or sexual orientations of the victim and suspect (if known) are included. To classify a prejudice incident, the San Francisco Police Department’s Hate Crimes Unit of the Special Investigations Division conducts an analysis of the incident report to determine if the incident falls under the definition of a “hate crime” as defined by state law. California Penal Code 422.55 - Hate Crime Definition C. UPDATE PROCESS These data are updated monthly. D. HOW TO USE THIS DATASET This dataset includes the following information about each incident: the hate crime offense, bias type, location/time, and the number of hate crime victims and suspects. The data presented mirrors data published by the California Department of Justice, albeit at a higher frequency. The publishing of these data meet requirements set forth in PC 13023. E. RELATED DATASETS California Department of Justice - Hate Crimes Info California Department of Justice - Hate Crimes Data
File List e001_arssnlvl0.csv (MD5: 75f21b9949b87c018f3499b5d2a093e7) e001_arssnlvl3.csv (MD5: 02cefd1cdb16fc25968f4e7d96a43378) e026_aslit.csv (MD5: eea09f91f81dc7e5801d8f67ccded5c1) e054_arssprecip.csv (MD5: 1d62c9ab92a6a1fc4caa53787d2c0cf1) e120_bmins.csv (MD5: e8699cfd2d2c9a11d43abd572b6f3cd8) e120_invnit1_2.csv (MD5: 6c023c438eb4c6e2625f526fafd9c17d) e120_invnit4_8.csv (MD5: a304786a833b5c6e53063472b97ef93d) e120_invnit16.csv (MD5: bda824cec8f8fdc2992c8732e03aa109) e120_nitbm.csv (MD5: 212637b7ab08d0cc2146f26602061b64) find_number_of_observations.R (MD5: a6a097d6633ee66b9c3531676320b929) multispatialCCM.zip (MD5: 64647cc7d14d2df5398a76af9ec73e2a) Description e001_arssnlvl0.csv is a comma-separated text file containing the data for Agroypron (Elymus) repens and Schizachyrium scoparium dynamics in unfertilized plots for experiment 001 at Cedar Creek. Column definitions are: 1."index": concatenated text including the plot, field, and year sampled 2. "Exp": Cedar Creek experiment number 3. "Year": year data was sampled 4. "Field": ID for field that was sampled 5. "Plot": plot number for sample 6. "Ntrt": categorical fertilization treatment 7. "Nadd": g nitrogen added per square meter per year for each treatment 8. "NitrAdd": g nitrate added per square meter per year for each treatment 9. "Natm.Nadd": g nitrogen added per square meter per year for each treatment, including 1 g/m2/year atmospheric deposition 10. "fg": plant functional group: C3/C4 for grasses with C3/C4 photosynthetic pathway, F for non-legume forb, L for legume. 11. "isspecies": Binary indicator describing whether or not a row had plant species found in it that year (should be 1 for all rows) 12. "richness": Species richness for all species found in the sample 13. "Agropyron repens": g dry aboveground biomass per meter square of A. repens 14. "Schizachyrium scoparium": g dry aboveground biomass per meter square of S. scoparium 15. "Miscellaneous litter": g dry aboveground biomass per meter square of leaf litter 16. "Ncat": Fertilization intensity category, with 1 being the lowest and 3 being the highest 17. "FieldPlot": concatenated text including the field and plot e001_arssnlvl3.csv is a comma-separated text file containing the data for Agroypron (Elymus) repens and Schizachyrium scoparium dynamics in heavily plots for experiment 001 at Cedar Creek. Column definitions are as described for e001_arssnlvl0.csv. e026_aslit.csv is a comma-separated text file containing the data for Agrostic scabra and leaf litter dynamics in plots with varying soil fertility in experiment 026 at Cedar Creek. Column definitions are: 1. "monoculture": plant species grown in subplot (should always be A. scabra) 2. "litbiomass": g dry aboveground biomass per meter square of leaf litter 3. "year": year of sampling 4. "plot": plot sampled (soil N treatments vary among plots) 5. "subplot": subplot sampled 6. "abvbiomass": g dry aboveground biomass per meter square of A. scabra 7. "totaln": total soil nitrogen (in percent of soil by mass) 8. "exp": Cedar Creek experiment number 9. "yearest": year in which the experiment was established 10. "nlevel": categorical level for total soil nitrogen treatment 11. "plotsubplot": concatenated text including the plot and subplot 12. "Field": ID for field that was sampled 13. "FieldPlot": concatenated text including the field and plot e054_arssprecip.csv is a comma-separated text file containing the data for A. repens, S. scoparium, leaf litter, and precipitation dynamics for experiment 054 at Cedar Creek. Column definitions are: 1. "index": concatenated text including the year, field, plot, transect sampled 2. "Exp": experiment number 3. "Year": year of sample 4. "OldField": old field ID 5. "Plot": plot number for sample 6. "Transect": transect ID 7. "YearAb": Year that the field was abandoned from agricultural use 8. "Agropyron repens": g dry aboveground biomass per meter square of A. repens 9. "Schizachyrium scoparium": g dry aboveground biomass per meter square of S. scoparium 10: "Miscellaneous litter": g dry aboveground biomass per meter square of leaf litter 11: "precipmm": total summer annual precipitation (June-August) in mm 12: "FieldPlot": concatenated text including the field and plot e120_bmins.csv is a comma-separated text file describing plant biomass and insect dynamics for Cedar Creek experiment 120. Column definitions are: 1. "Exp": Cedar Creek experiment number 2. "Year": sampling year 3. "Month": sampling month 4. "Plot": plot sampled 5. "NumSp": number of species in treatment 6. "SpNum": number of species maintained in plot 7. "AbvBioAnnProd": g plant aboveground biomass harvested per square meter per year 8. "noh020tot": mg soil nitrate per kg soil, sampled in top 20 cm of soil 9. "insectcount": number of insect individuals in sweep net sample 10. "insectsp": number of insect species in sweep net sample 11. "Field": field ID 12. "FieldPlot": concatenated text including the field and plot e120_invnit1_2.csv is a comma-separated text file describing invading plant species dynamics and soil nitrate dynamics in monoculture plots for experiment 120 at Cedar Creek. Column definitions are as described for e120_bmins.csv, except for: 9. “invrichness”: number of non-planted “invading” plant species e120_invnit4_8.csv is a comma-separated text file describing invading plant species dynamics and soil nitrate dynamics in 4 and 8 species mixture plots for experiment 120 at Cedar Creek. Column definitions are as described for e120_invnit1_2.csv. e120_invnit16.csv is a comma-separated text file describing invading plant species dynamics and soil nitrate dynamics in 16 species mixture plots for experiment 120 at Cedar Creek. Column definitions are as described for e120_invnit1_2.csv. e120_nitbm.csv is a comma-separated text file describing soil nitrate and aboveground plant biomass dynamics. Column definitions are as described for e120_bmins.csv. find_number_of_observations.R is an R source code file that can be used to determine the number of sequential observations in subplots for all of the data sets listed above. The data should be in the working directory of R when the R code is run. multispatialCCM.zip...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1 Introduction
The Peatland Decomposition Database (PDD) stores data from published litterbag experiments related to peatlands. Currently, the database focuses on northern peatlands and Sphagnum litter and peat, but it also contains data from some vascular plant litterbag experiments. Currently, the database contains entries from 34 studies, 2,160 litterbag experiments, and 7,297 individual samples with 117,841 measurements for various attributes (e.g. relative mass remaining, N content, holocellulose content, mesh size). The aim is to provide a harmonized data source that can be useful to re-analyse existing data and to plan future litterbag experiments.
The Peatland Productivity and Decomposition Parameter Database (PPDPD) (Bona et al. 2018) is similar to the Peatland Decomposition Database (PDD) in that both contain data from peatland litterbag experiments. The differences are that both databases partly contain different data, that PPDPD additionally contains information on vegetation productivity, which PDD does not, and that PDD provides more information and metadata on litterbag experiments, and also measurement errors.
2 Updates
Compared to version 1.0.0, this version has a new structure for table experimental_design_format, contains additional metadata on the experimental design (these were omitted in version 1.0.0), and contains the scripts that were used to import the data into the database.
3 Methods
3.1 Data collection
Data for the database was collected from published litterbag studies, by extracting published data from figures, tables, or other data sources, and by contacting the authors of the studies to obtain raw data. All data processing was done with R (R version 4.2.0 (2022-04-22)) (R Core Team 2022).
Studies were identified via a Scopus search with search string (TITLE-ABS-KEY ( peat* AND ( "litter bag" OR "decomposition rate" OR "decay rate" OR "mass loss")) AND NOT ("tropic*")) (2022-12-17). These studies were further screened to exclude those which do not contain litterbag data or which recycle data from other studies that have already been considered. Additional studies with litterbag experiments in northern peatlands we were aware of, but which were not identified in the literature search were added to the list of publications. For studies not older than 10 years, authors were contacted to obtain raw data, however this was successful only in few cases. To date, the database focuses on Sphagnum litterbag experiments and not from all studies that were identified by the literature search data have been included yet in the database.
Data from figures were extracted using the package ‘metaDigitise’ (1.0.1) (Pick, Nakagawa, and Noble 2018). Data from tables were extracted manually.
Data from the following studies are currently included: Farrish and Grigal (1985), Bartsch and Moore (1985), Farrish and Grigal (1988), Vitt (1990), Hogg, Lieffers, and Wein (1992), Sanger, Billett, and Cresser (1994), Hiroki and Watanabe (1996), Szumigalski and Bayley (1996), Prevost, Belleau, and Plamondon (1997), Arp, Cooper, and Stednick (1999), Robbert A. Scheffer and Aerts (2000), R. A. Scheffer, Van Logtestijn, and Verhoeven (2001), Limpens and Berendse (2003), Waddington, Rochefort, and Campeau (2003), Asada, Warner, and Banner (2004), Thormann, Bayley, and Currah (2001), Trinder, Johnson, and Artz (2008), Breeuwer et al. (2008), Trinder, Johnson, and Artz (2009), Bragazza and Iacumin (2009), Hoorens, Stroetenga, and Aerts (2010), Straková et al. (2010), Straková et al. (2012), Orwin and Ostle (2012), Lieffers (1988), Manninen et al. (2016), Johnson and Damman (1991), Bengtsson, Rydin, and Hájek (2018a), Bengtsson, Rydin, and Hájek (2018b), Asada and Warner (2005), Bengtsson, Granath, and Rydin (2017), Bengtsson, Granath, and Rydin (2016), Hagemann and Moroni (2015), Hagemann and Moroni (2016), B. Piatkowski et al. (2021), B. T. Piatkowski et al. (2021), Mäkilä et al. (2018), Golovatskaya and Nikonova (2017), Golovatskaya and Nikonova (2017).
4 Database records
The database is a ‘MariaDB’ database and the database schema was designed to store data and metadata following the Ecological Metadata Language (EML) (Jones et al. 2019). Descriptions of the tables are shown in Tab. 1.
The database contains general metadata relevant for litterbag experiments (e.g., geographical, temporal, and taxonomic coverage, mesh sizes, experimental design). However, it does not contain a detailed description of sample handling, sample preprocessing methods, site descriptions, because there currently are no discipline-specific metadata and reporting standards. Table 1: Description of the individual tables in the database.
Name Description
attributes Defines the attributes of the database and the values in column attribute_name in table data.
citations Stores bibtex entries for references and data sources.
citations_to_datasets Links entries in table citations with entries in table datasets.
custom_units Stores custom units.
data Stores measured values for samples, for example remaining masses.
datasets Lists the individual datasets.
experimental_design_format Stores information on the experimental design of litterbag experiments.
measurement_scales, measurement_scales_date_time, measurement_scales_interval, measurement_scales_nominal, measurement_scales_ordinal, measurement_scales_ratio Defines data value types.
missing_value_codes Defines how missing values are encoded.
samples Stores information on individual samples.
samples_to_samples Links samples to other samples, for example litter samples collected in the field to litter samples collected during the incubation of the litterbags.
units, unit_types Stores information on measurement units.
5 Attributes Table 2: Definition of attributes in the Peatland Decomposition Database and entries in the column attribute_name in table data.
Name Definition Example value Unit Measurement scale Number type Minimum value Maximum value String format
4_hydroxyacetophenone_mass_absolute A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA
4_hydroxyacetophenone_mass_relative_mass A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA
4_hydroxybenzaldehyde_mass_absolute A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA
4_hydroxybenzaldehyde_mass_relative_mass A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA
4_hydroxybenzoic_acid_mass_absolute A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA
4_hydroxybenzoic_acid_mass_relative_mass A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA
abbreviation In table custom_units: A string representing an abbreviation for the custom unit. gC NA nominal NA NA NA NA
acetone_extractives_mass_absolute A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA
acetone_extractives_mass_relative_mass A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA
acetosyringone_mass_absolute A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA
acetosyringone_mass_relative_mass A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA
acetovanillone_mass_absolute A numeric value representing the content of acetovanillone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA
acetovanillone_mass_relative_mass A numeric value representing the content of acetovanillone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA
arabinose_mass_absolute A numeric value representing the content of arabinose, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA
arabinose_mass_relative_mass A numeric value representing the content of arabinose, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA
ash_mass_absolute A numeric value representing the content of ash (after burning at 550°C). 4 g ratio real 0 Inf NA
ash_mass_relative_mass A numeric value representing the content of ash (after burning at 550°C). 0.05 g/g ratio real 0 Inf NA
attribute_definition A free text field with a textual description of the meaning of attributes in the dpeatdecomposition database. NA NA nominal NA NA NA NA
attribute_name A string describing the names of the attributes in all tables of the dpeatdecomposition database. attribute_name NA nominal NA NA NA NA
bibtex A string representing the bibtex code used for a literature reference throughout the dpeatdecomposition database. Galka.2021 NA nominal NA NA NA NA
bounds_maximum A numeric value representing the minimum possible value for a numeric attribute. 0 NA interval real Inf Inf NA
bounds_minimum A numeric value representing the maximum possible value for a numeric attribute. INF NA interval real Inf Inf NA
bulk_density A numeric value representing the bulk density of the sample [g cm-3]. 0,2 g/cm^3 ratio real 0 Inf NA
C_absolute The absolute mass of C in the sample. 1 g ratio real 0 Inf NA
C_relative_mass The absolute mass of C in the sample. 1 g/g ratio real 0 Inf NA
C_to_N A numeric value representing the C to N ratio of the sample. 35 g/g ratio real 0 Inf NA
C_to_P A numeric value representing the C to P ratio of the sample. 35 g/g ratio real 0 Inf NA
Ca_absolute The
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset contains original (historical) data of RÚIAN elements, in which in the past any change occured. The user can use it for re-creation of changes in RÚIAN data (since 2012). Descriptive data for each element is specified only. Dataset contains no spatial location (polygons, definition lines and centroids of RÚIAN elements). It is possible to download file for the whole state territory or for selected municipality only. The file covering the whole state territory contains following elements: state, cohesion region, higher territorial self-governing entity (VÚSC), municipality with extended competence (ORP), authorized municipal office (POU), region (old ones – defined in 1960), county, municipality, municipality part, town district (MOMC), Prague city district (MOP), town district of Prague (SOP), cadastral units and basic urban units (ZSJ). Files for specified municipality contain following elements: municipality, municipality part, MOMC (for territorialy structured statutory cities), MOP (for Prague), SOP (for Prague), cadastral unit, ZSJ, streets, building objects and address points. Dataset is provided as Open Data (licence CC-BY 4.0). Data is based on RÚIAN (Register of Territorial Identification, Addresses and Real Estates). Data is created once a month in RÚIAN exchange format (VFR), which is based on XML language and fulfils the GML 3.2.1 standard (according to ISO 19136:2007). Dataset is compressed (ZIP) for downloading. More in the Act No. 111/2009 Coll., on the Basic Registers, in Decree no. 359/2011 Coll., on the Basic Register of Territorial Identification, Addresses and Real Estates.
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Dataset for the maps accompanying the Housing in Aotearoa New Zealand: 2025 report. This dataset contains data for severe housing deprivation from the 2018 and 2023 Censuses.
Data is available by health district.
Severe housing deprivation has data for the census usually resident population from the 2018 and 2023 Censuses, including:
Map shows the estimated prevalence rate of severe housing deprivation (per 10,000 people) for the census usually resident population for the 2023 Census.
Download lookup file from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.
Footnotes
Geographical boundaries
Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.
Subnational census usually resident population
The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.
Population counts
Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts.
Caution using time series
Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).
Severe housing deprivation time series
The 2018 estimates of severe housing deprivation have been updated using the 2023 methodology for estimating severe housing deprivation. Severe housing deprivation (homelessness) estimates – updated methodology: 2023 Census has more information.
Severe housing deprivation
Figures in this map and geospatial file exclude Women’s refuge data, as well as estimates for children living in non-private dwellings. Severe housing deprivation (homelessness) estimates – updated methodology: 2023 Census has more information.
About the 2023 Census dataset
For information on the 2023 Census dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.
Data quality
The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.
Quality rating of a variable
The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.
Census usually resident population count concept quality rating
The census usually resident population count is rated as very high quality.
Census usually resident population count – 2023 Census: Information by concept has more information, for example, definitions and data quality.
Quality of severe housing deprivation data
Severe housing deprivation (homelessness) estimates – updated methodology: 2023 Census has more information on the data quality of this variable.
Using data for good
Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.
Confidentiality
The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.
Inconsistencies in definitions
Please note that there may be differences in definitions between census classifications and those used for other data collections.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
A dataset of 7077 labeled vocalizations made by non-speaking individuals. Each vocalization lasts approximately 0.5-4 seconds and is labeled with its affective or communicative meaning. Data were acquired in real-world settings (homes, schools, etc.) and were labeled in real-time by parents or caregivers who knew the non-speaking communicator well.
dataset_file_directory.csv provides the name of each vocalization file, the corresponding participant ID, and the vocalization meaning or label (delighted, frustrated, request, etc.).
If you use this dataset, please cite Johnson & Narain et al., "ReCANVo: A Database of Real-World Communicative and Affective Nonverbal Vocalizations". The authors are Jaya Narain, Kristina T. Johnson, Thomas Quatieri, Pattie Maes, and Rosalind Picard. This paper provides more information about the dataset, including data acquisition methodology, pre-processing procedures, and participant demographics.
**J.N. and K.T.J. are joint first authors on this project. Please include both names in attribution when possible (e.g., Johnson & Narain et al.).
This is the data dictionary/codebook for the Official City of Rochester, NY's Real Estate Inventory Parcel Dataset. To link to the actual dataset, click here.
Data Dictionary for the Real Property CAMA information attached to parcel datasets.Supplemental information regarding the data values can be found here: https://myplace.cuyahogacounty.us/FieldDefinitions.html