72 datasets found
  1. n

    AirNow Air Quality Monitoring Data (Current) - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). AirNow Air Quality Monitoring Data (Current) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/airnow-air-quality-monitoring-data-current
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    This United States Environmental Protection Agency (US EPA) feature layer represents monitoring site data, updated hourly concentrations and Air Quality Index (AQI) values for the latest hour received from monitoring sites that report to AirNow.Map and forecast data are collected using federal reference or equivalent monitoring techniques or techniques approved by the state, local or tribal monitoring agencies. To maintain "real-time" maps, the data are displayed after the end of each hour. Although preliminary data quality assessments are performed, the data in AirNow are not fully verified and validated through the quality assurance procedures monitoring organizations used to officially submit and certify data on the EPA Air Quality System (AQS).This data sharing, and centralization creates a one-stop source for real-time and forecast air quality data. The benefits include quality control, national reporting consistency, access to automated mapping methods, and data distribution to the public and other data systems. The U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, National Park Service, tribal, state, and local agencies developed the AirNow system to provide the public with easy access to national air quality information. State and local agencies report the Air Quality Index (AQI) for cities across the US and parts of Canada and Mexico. AirNow data are used only to report the AQI, not to formulate or support regulation, guidance or any other EPA decision or position.About the AQIThe Air Quality Index (AQI) is an index for reporting daily air quality. It tells you how clean or polluted your air is, and what associated health effects might be a concern for you. The AQI focuses on health effects you may experience within a few hours or days after breathing polluted air. EPA calculates the AQI for five major air pollutants regulated by the Clean Air Act: ground-level ozone, particle pollution (also known as particulate matter), carbon monoxide, sulfur dioxide, and nitrogen dioxide. For each of these pollutants, EPA has established national air quality standards to protect public health. Ground-level ozone and airborne particles (often referred to as "particulate matter") are the two pollutants that pose the greatest threat to human health in this country.A number of factors influence ozone formation, including emissions from cars, trucks, buses, power plants, and industries, along with weather conditions. Weather is especially favorable for ozone formation when it’s hot, dry and sunny, and winds are calm and light. Federal and state regulations, including regulations for power plants, vehicles and fuels, are helping reduce ozone pollution nationwide.Fine particle pollution (or "particulate matter") can be emitted directly from cars, trucks, buses, power plants and industries, along with wildfires and woodstoves. But it also forms from chemical reactions of other pollutants in the air. Particle pollution can be high at different times of year, depending on where you live. In some areas, for example, colder winters can lead to increased particle pollution emissions from woodstove use, and stagnant weather conditions with calm and light winds can trap PM2.5 pollution near emission sources. Federal and state rules are helping reduce fine particle pollution, including clean diesel rules for vehicles and fuels, and rules to reduce pollution from power plants, industries, locomotives, and marine vessels, among others.How Does the AQI Work?Think of the AQI as a yardstick that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 represents good air quality with little potential to affect public health, while an AQI value over 300 represents hazardous air quality.An AQI value of 100 generally corresponds to the national air quality standard for the pollutant, which is the level EPA has set to protect public health. AQI values below 100 are generally thought of as satisfactory. When AQI values are above 100, air quality is considered to be unhealthy-at first for certain sensitive groups of people, then for everyone as AQI values get higher.Understanding the AQIThe purpose of the AQI is to help you understand what local air quality means to your health. To make it easier to understand, the AQI is divided into six categories:Air Quality Index(AQI) ValuesLevels of Health ConcernColorsWhen the AQI is in this range:..air quality conditions are:...as symbolized by this color:0 to 50GoodGreen51 to 100ModerateYellow101 to 150Unhealthy for Sensitive GroupsOrange151 to 200UnhealthyRed201 to 300Very UnhealthyPurple301 to 500HazardousMaroonNote: Values above 500 are considered Beyond the AQI. Follow recommendations for the Hazardous category. Additional information on reducing exposure to extremely high levels of particle pollution is available here.Each category corresponds to a different level of health concern. The six levels of health concern and what they mean are:"Good" AQI is 0 to 50. Air quality is considered satisfactory, and air pollution poses little or no risk."Moderate" AQI is 51 to 100. Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people. For example, people who are unusually sensitive to ozone may experience respiratory symptoms."Unhealthy for Sensitive Groups" AQI is 101 to 150. Although general public is not likely to be affected at this AQI range, people with lung disease, older adults and children are at a greater risk from exposure to ozone, whereas persons with heart and lung disease, older adults and children are at greater risk from the presence of particles in the air."Unhealthy" AQI is 151 to 200. Everyone may begin to experience some adverse health effects, and members of the sensitive groups may experience more serious effects."Very Unhealthy" AQI is 201 to 300. This would trigger a health alert signifying that everyone may experience more serious health effects."Hazardous" AQI greater than 300. This would trigger a health warnings of emergency conditions. The entire population is more likely to be affected.AQI colorsEPA has assigned a specific color to each AQI category to make it easier for people to understand quickly whether air pollution is reaching unhealthy levels in their communities. For example, the color orange means that conditions are "unhealthy for sensitive groups," while red means that conditions may be "unhealthy for everyone," and so on.Air Quality Index Levels of Health ConcernNumericalValueMeaningGood0 to 50Air quality is considered satisfactory, and air pollution poses little or no risk.Moderate51 to 100Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.Unhealthy for Sensitive Groups101 to 150Members of sensitive groups may experience health effects. The general public is not likely to be affected.Unhealthy151 to 200Everyone may begin to experience health effects; members of sensitive groups may experience more serious health effects.Very Unhealthy201 to 300Health alert: everyone may experience more serious health effects.Hazardous301 to 500Health warnings of emergency conditions. The entire population is more likely to be affected.Note: Values above 500 are considered Beyond the AQI. Follow recommendations for the "Hazardous category." Additional information on reducing exposure to extremely high levels of particle pollution is available here.

  2. Procurement Contracts - Datasets - Lincolnshire Open Data

    • lincolnshire.ckan.io
    Updated Aug 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lincolnshire.ckan.io (2017). Procurement Contracts - Datasets - Lincolnshire Open Data [Dataset]. https://lincolnshire.ckan.io/dataset/contracts
    Explore at:
    Dataset updated
    Aug 16, 2017
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Lincolnshire
    Description

    Existing Contracts Register for awarded contracts over £5,000. An extract of all published contract awards starting from 01 April 2017. The data names the buyer and the awarded suppliers, plus information on the value and duration of the contract itself. This is a work in progress - by improving data quality and maintaining compliance with procurement regulations, the Council is working towards a complete dataset. Other details about Contracts shown in this dataset may be available on the ProContract website (source link shown below). That website may for example, also provide information on suppliers for Contracts that have more than one supplier. The data is updated quarterly. Data source: Procurement Lincolnshire, Lincolnshire County Council. For any enquiries about this publication contact procontract.support@lincolnshire.gov.uk

  3. f

    Table_1_Rule-Based Models for Risk Estimation and Analysis of In-hospital...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliver Haas; Andreas Maier; Eva Rothgang (2023). Table_1_Rule-Based Models for Risk Estimation and Analysis of In-hospital Mortality in Emergency and Critical Care.XLSX [Dataset]. http://doi.org/10.3389/fmed.2021.785711.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Oliver Haas; Andreas Maier; Eva Rothgang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose a novel method that uses associative classification and odds ratios to predict in-hospital mortality in emergency and critical care. Manual mortality risk scores have previously been used to assess the care needed for each patient and their need for palliative measures. Automated approaches allow providers to get a quick and objective estimation based on electronic health records. We use association rule mining to find relevant patterns in the dataset. The odds ratio is used instead of classical association rule mining metrics as a quality measure to analyze association instead of frequency. The resulting measures are used to estimate the in-hospital mortality risk. We compare two prediction models: one minimal model with socio-demographic factors that are available at the time of admission and can be provided by the patients themselves, namely gender, ethnicity, type of insurance, language, and marital status, and a full model that additionally includes clinical information like diagnoses, medication, and procedures. The method was tested and validated on MIMIC-IV, a publicly available clinical dataset. The minimal prediction model achieved an area under the receiver operating characteristic curve value of 0.69, while the full prediction model achieved a value of 0.98. The models serve different purposes. The minimal model can be used as a first risk assessment based on patient-reported information. The full model expands on this and provides an updated risk assessment each time a new variable occurs in the clinical case. In addition, the rules in the models allow us to analyze the dataset based on data-backed rules. We provide several examples of interesting rules, including rules that hint at errors in the underlying data, rules that correspond to existing epidemiological research, and rules that were previously unknown and can serve as starting points for future studies.

  4. n

    Jurisdictional Unit (Public) - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Jurisdictional Unit (Public) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/jurisdictional-unit-public
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The

  5. Geodatabase for the Baltimore Ecosystem Study Spatial Data

    • search.dataone.org
    • portal.edirepository.org
    Updated Apr 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spatial Analysis Lab; Jarlath O'Neal-Dunne; Morgan Grove (2020). Geodatabase for the Baltimore Ecosystem Study Spatial Data [Dataset]. https://search.dataone.org/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fknb-lter-bes%2F3120%2F150
    Explore at:
    Dataset updated
    Apr 1, 2020
    Dataset provided by
    Long Term Ecological Research Networkhttp://www.lternet.edu/
    Authors
    Spatial Analysis Lab; Jarlath O'Neal-Dunne; Morgan Grove
    Time period covered
    Jan 1, 1999 - Jun 1, 2014
    Area covered
    Description

    The establishment of a BES Multi-User Geodatabase (BES-MUG) allows for the storage, management, and distribution of geospatial data associated with the Baltimore Ecosystem Study. At present, BES data is distributed over the internet via the BES website. While having geospatial data available for download is a vast improvement over having the data housed at individual research institutions, it still suffers from some limitations. BES-MUG overcomes these limitations; improving the quality of the geospatial data available to BES researches, thereby leading to more informed decision-making. BES-MUG builds on Environmental Systems Research Institute's (ESRI) ArcGIS and ArcSDE technology. ESRI was selected because its geospatial software offers robust capabilities. ArcGIS is implemented agency-wide within the USDA and is the predominant geospatial software package used by collaborating institutions. Commercially available enterprise database packages (DB2, Oracle, SQL) provide an efficient means to store, manage, and share large datasets. However, standard database capabilities are limited with respect to geographic datasets because they lack the ability to deal with complex spatial relationships. By using ESRI's ArcSDE (Spatial Database Engine) in conjunction with database software, geospatial data can be handled much more effectively through the implementation of the Geodatabase model. Through ArcSDE and the Geodatabase model the database's capabilities are expanded, allowing for multiuser editing, intelligent feature types, and the establishment of rules and relationships. ArcSDE also allows users to connect to the database using ArcGIS software without being burdened by the intricacies of the database itself. For an example of how BES-MUG will help improve the quality and timeless of BES geospatial data consider a census block group layer that is in need of updating. Rather than the researcher downloading the dataset, editing it, and resubmitting to through ORS, access rules will allow the authorized user to edit the dataset over the network. Established rules will ensure that the attribute and topological integrity is maintained, so that key fields are not left blank and that the block group boundaries stay within tract boundaries. Metadata will automatically be updated showing who edited the dataset and when they did in the event any questions arise. Currently, a functioning prototype Multi-User Database has been developed for BES at the University of Vermont Spatial Analysis Lab, using Arc SDE and IBM's DB2 Enterprise Database as a back end architecture. This database, which is currently only accessible to those on the UVM campus network, will shortly be migrated to a Linux server where it will be accessible for database connections over the Internet. Passwords can then be handed out to all interested researchers on the project, who will be able to make a database connection through the Geographic Information Systems software interface on their desktop computer. This database will include a very large number of thematic layers. Those layers are currently divided into biophysical, socio-economic and imagery categories. Biophysical includes data on topography, soils, forest cover, habitat areas, hydrology and toxics. Socio-economics includes political and administrative boundaries, transportation and infrastructure networks, property data, census data, household survey data, parks, protected areas, land use/land cover, zoning, public health and historic land use change. Imagery includes a variety of aerial and satellite imagery. See the readme: http://96.56.36.108/geodatabase_SAL/readme.txt See the file listing: http://96.56.36.108/geodatabase_SAL/diroutput.txt

  6. 2023 Census internal migration by TALB

    • datafinder.stats.govt.nz
    csv, dbf (dbase iii) +4
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2023). 2023 Census internal migration by TALB [Dataset]. https://datafinder.stats.govt.nz/table/122425-2023-census-internal-migration-by-talb/
    Explore at:
    csv, geodatabase, mapinfo tab, mapinfo mif, geopackage / sqlite, dbf (dbase iii)Available download formats
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Description

    Dataset contains counts for territorial authority local board area (TALB) of usual residence by TALB of usual residence address one year ago and five years ago, and by life cycle age group, for the census usually resident population count, 2023 Census.

    This dataset compares usual residence at the 2023 Census with usual residence one and five years earlier to show population mobility and internal migration patterns of people within New Zealand.

    ‘Usual residence address’ is the address of the dwelling where a person considers that they usually live.

    ‘Usual residence one year ago address’ identifies an individual’s usual residence on 7 March 2022, which may be different to their current usual residence on census night 2023 (7 March 2023).

    ‘Usual residence five years ago address’ identifies an individual’s usual residence on 6 March 2018, which may be different to their current usual residence on census night 2023 (7 March 2023).

    Note: This dataset only includes usual residence address information for individuals whose usual residence address one year ago and five years ago is available at TALB.

    Life cycle age groups are categorised as:

    • under 15 years
    • 15–29 years
    • 30–64 years
    • 65 years and over.

    This dataset can be used in conjunction with the following spatial files by joining on the TALB code values:

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city. 

    Population counts

    Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts. 

    Rows excluded from the dataset

    Rows show TALB of usual residence by TALB of usual residence one year ago and five years ago, by life cycle age group. Cells with a number less than six have been confidentialised. Responses to categories unable to be mapped, such as response unidentifiable, not stated, and Auckland (not further defined), have also been excluded from this dataset.

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Quality rating of a variable

    The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.

    Age quality rating

    Age is rated as very high quality.

    Age – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Census usually resident population quality rating

    The census usually resident population count is rated as very high quality.

    Census usually resident population count – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Usual residence address quality rating

    Usual residence address is rated as high quality.

    Usual residence address – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Usual residence one year ago quality rating

    Usual residence one year ago area is rated as high quality.

    Usual residence one year ago – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Usual residence five years ago quality rating

    Usual residence five years ago area is rated as high quality.

    Usual residence five years ago – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Symbol

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

  7. 2023 Census change in occupied and unoccupied private dwellings by SA2

    • maps-by-statsnz.hub.arcgis.com
    • 2023census-statsnz.hub.arcgis.com
    Updated Sep 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics New Zealand (2024). 2023 Census change in occupied and unoccupied private dwellings by SA2 [Dataset]. https://maps-by-statsnz.hub.arcgis.com/maps/69060ed28ba4470499cdea70c8c83226
    Explore at:
    Dataset updated
    Sep 4, 2024
    Dataset authored and provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Map shows the percentage change in number of occupied and unoccupied private dwellings between the 2018 and 2023 Censuses.Download lookup file from Stats NZ ArcGIS Online or Stats NZ geographic data service.FootnotesGeographical boundariesStatistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018. Caution using time series Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data). About the 2023 Census dataset For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings. Data quality The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.Quality rating of a variable The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable. Dwelling occupancy status quality rating Dwelling occupancy status is rated as high quality. Dwelling occupancy status – 2023 Census: Information by concept has more information, for example, definitions and data quality.Dwelling type quality rating Dwelling type is rated as moderate quality. Dwelling type – 2023 Census: Information by concept has more information, for example, definitions and data quality.Using data for good Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.Confidentiality The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.Symbol-998 Not applicable-999 Confidential

  8. f

    Evidence supporting the rule of symmetry for OSM data sets.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Zhang; Weijun Yin; Shouqian Huang; Jianwei Yu; Zhongheng Wu; Tinghua Ai (2023). Evidence supporting the rule of symmetry for OSM data sets. [Dataset]. http://doi.org/10.1371/journal.pone.0200334.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Xiang Zhang; Weijun Yin; Shouqian Huang; Jianwei Yu; Zhongheng Wu; Tinghua Ai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    †Proportion obtained by removing parallel pairs with one empty and one non-empty values. ‡Proportion obtained by treating pairs with one empty and one non-empty values as symmetrical examples.

  9. 2023 Census population change by age group and regional council

    • datafinder.stats.govt.nz
    csv, dwg, geodatabase +6
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ, 2023 Census population change by age group and regional council [Dataset]. https://datafinder.stats.govt.nz/layer/117618-2023-census-population-change-by-age-group-and-regional-council/
    Explore at:
    kml, geopackage / sqlite, dwg, csv, geodatabase, mapinfo tab, shapefile, pdf, mapinfo mifAvailable download formats
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Area covered
    Description

    Dataset contains life-cycle age group census usually resident population counts from the 2013, 2018, and 2023 Censuses, as well as the percentage change in the age group population counts between the 2013 and 2018 Censuses, and between the 2018 and 2023 Censuses. Data is available by regional council.

    The life-cycle age groups are:

    • under 15 years
    • 15 to 29 years
    • 30 to 64 years
    • 65 years and over.

    Map shows the percentage change in the census usually resident population count for life-cycle age groups between the 2018 and 2023 Censuses.

    Download lookup file from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Quality rating of a variable

    The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.

    Age concept quality rating

    Age is rated as very high quality.

    Age – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

  10. c

    ckanext-dataset - Extensions - CKAN Ecosystem Catalog

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-dataset - Extensions - CKAN Ecosystem Catalog [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-dataset
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The Dataset extension for CKAN enhances the core functionality of CKAN by providing the ability to add custom fields and behaviors specifically tailored for datasets. This allows administrators to further refine dataset metadata and customize dataset workflows to better suit the needs of their organization or community. This customization aims to improve data discoverability, management, and overall usability within the CKAN platform. Key Features: Custom Fields for Datasets: Introduce new metadata fields beyond the standard CKAN schema, enabling more granular description and categorization of datasets. This can include fields related to data quality, access restrictions, or specific data types. Custom Dataset Behavior: Modify the default behavior of datasets within CKAN, for example, to implement customized validation rules, automated processes, or user interface elements. This level of customization is designed to streamline workflows and enhance user experience. Technical Integration: Although the details are not explicitly outlined, it is likely that the Dataset extension integrates with CKAN by utilizing CKAN's plugin architecture. It might involve creating new plugins that intercept CKAN's default dataset handling logic, or leveraging CKAN's templating engine to modify dataset display and editing interfaces. Initialization likely involves configuring the custom fields and behaviors within CKAN's configuration files or administrative interface. Benefits & Impact: By allowing administrators to define custom metadata fields, the Dataset extension can significantly enhance the discoverability and reusability of datasets. Furthermore, personalized behaviour of datasets inside the application can streamlines standard administrative operations, reducing manual workloads and improving the accuracy of dataset information.

  11. h

    no-oranges

    • huggingface.co
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav Karra (2025). no-oranges [Dataset]. http://doi.org/10.57967/hf/6019
    Explore at:
    Dataset updated
    Jul 28, 2025
    Authors
    Pranav Karra
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    No-Oranges Dataset

      Dataset Description
    

    This is a comprehensive instruction-tuning dataset designed to train language models to avoid generating specific forbidden words while maintaining natural language capabilities. The dataset combines multiple sources of high-quality training data including AI-generated adversarial examples and rule-based prompts.

      Dataset Summary
    

    Total Samples: 1,948 high-quality unique samples Task Type: Instruction following with… See the full description on the dataset page: https://huggingface.co/datasets/pranavkarra/no-oranges.

  12. Global Data Regulation Diagnostic Survey Dataset 2021 - Afghanistan, Angola,...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Oct 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2023). Global Data Regulation Diagnostic Survey Dataset 2021 - Afghanistan, Angola, Argentina...and 77 more [Dataset]. https://microdata.worldbank.org/index.php/catalog/3866
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2020
    Area covered
    Afghanistan, Angola, Argentina...and 77 more
    Description

    Abstract

    The Global Data Regulation Diagnostic provides a comprehensive assessment of the quality of the data governance environment. Diagnostic results show that countries have put in greater effort in adopting enabler regulatory practices than in safeguard regulatory practices. However, for public intent data, enablers for private intent data, safeguards for personal and nonpersonal data, cybersecurity and cybercrime, as well as cross-border data flows. Across all these dimensions, no income group demonstrates advanced regulatory frameworks across all dimensions, indicating significant room for the regulatory development of both enablers and safeguards remains at an intermediate stage: 47 percent of enabler good practices and 41 percent of good safeguard practices are adopted across countries. Under the enabler and safeguard pillars, the diagnostic covers dimensions of e-commerce/e-transactions, enablers further improvement on data governance environment.

    The Global Data Regulation Diagnostic is the first comprehensive assessment of laws and regulations on data governance. It covers enabler and safeguard regulatory practices in 80 countries providing indicators to assess and compare their performance. This Global Data Regulation Diagnostic develops objective and standardized indicators to measure the regulatory environment for the data economy across countries. The indicators aim to serve as a diagnostic tool so countries can assess and compare their performance vis-á-vis other countries. Understanding the gap with global regulatory good practices is a necessary first step for governments when identifying and prioritizing reforms.

    Geographic coverage

    80 countries

    Analysis unit

    Country

    Kind of data

    Observation data/ratings [obs]

    Sampling procedure

    The diagnostic is based on a detailed assessment of domestic laws, regulations, and administrative requirements in 80 countries selected to ensure a balanced coverage across income groups, regions, and different levels of digital technology development. Data are further verified through a detailed desk research of legal texts, reflecting the regulatory status of each country as of June 1, 2020.

    Mode of data collection

    Mail Questionnaire [mail]

    Research instrument

    The questionnaire comprises 37 questions designed to determine if a country has adopted good regulatory practice on data governance. The responses are then scored and assigned a normative interpretation. Related questions fall into seven clusters so that when the scores are averaged, each cluster provides an overall sense of how it performs in its corresponding regulatory and legal dimensions. These seven dimensions are: (1) E-commerce/e-transaction; (2) Enablers for public intent data; (3) Enablers for private intent data; (4) Safeguards for personal data; (5) Safeguards for nonpersonal data; (6) Cybersecurity and cybercrime; (7) Cross-border data transfers.

    Response rate

    100%

  13. 2023 Census main means of travel to education by statistical area 3

    • datafinder.stats.govt.nz
    csv, dbf (dbase iii) +4
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2025). 2023 Census main means of travel to education by statistical area 3 [Dataset]. https://datafinder.stats.govt.nz/table/122495-2023-census-main-means-of-travel-to-education-by-statistical-area-3/
    Explore at:
    csv, geopackage / sqlite, dbf (dbase iii), mapinfo tab, mapinfo mif, geodatabaseAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Description

    Dataset shows an individual’s statistical area 3 (SA3) of usual residence and the SA3 of their place of study, for the census usually resident population count who are studying (part time or full time), by main means of travel to education from the 2018 and 2023 Censuses.

    The main means of travel to education categories are:

    • Study at home
    • Drive a car, truck, or van
    • Passenger in a car, truck, or van
    • Bicycle
    • Walk or jog
    • School bus
    • Public bus
    • Train
    • Ferry
    • Other.

    Main means of travel to education is the usual method a person used to travel the longest distance to their place of study.

    Educational institution address is the physical location of the individual’s place of study. Educational institutions include early childhood education, primary school, secondary school, and tertiary education institutions. For individuals who study at home, their educational institution address is the same as their usual residence address.

    Educational institution address is coded to the most detailed geography possible from the available information. This dataset only includes travel to education information for individuals whose educational institution address is available at SA3 level. The sum of the counts for each region in this dataset may not equal the census usually resident population count who are studying (part time or full time) for that region. Educational institution address – 2023 Census: Information by concept has more information.

    This dataset can be used in conjunction with the following spatial files by joining on the SA3 code values:

    Download data table using the instructions in the Koordinates help guide.

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city. 

    Population counts

    Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts. 

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data).

    Educational institution address time series

    Educational institution address time series data should be interpreted with care at lower geographic levels, such as statistical area 2 (SA2). Methodological improvements in 2023 Census resulted in greater data accuracy, including a greater proportion of people being counted at lower geographic areas compared to the 2018 Census. Educational institution address – 2023 Census: Information by concept has more information.

    Rows excluded from the dataset

    Rows show SA3 of usual residence by SA3 of educational institution address. Rows with a total population count of less than six have been removed to reduce the size of the dataset, given only a small proportion of SA3-SA3 combinations have commuter flows.

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Quality rating of a variable

    The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.

    Main means of travel to education quality rating

    Main means of travel to education is rated as moderate quality.

    Main means of travel to education – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Educational institution address quality rating

    Educational institution address is rated as moderate quality.

    Educational institution address – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Percentages

    To calculate percentages, divide the figure for the category of interest by the figure for ‘Total stated’ where this applies.

    Symbol

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

  14. 2023 Census main means of travel to work by statistical area 3

    • datafinder.stats.govt.nz
    csv, dbf (dbase iii) +4
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2025). 2023 Census main means of travel to work by statistical area 3 [Dataset]. https://datafinder.stats.govt.nz/table/122496-2023-census-main-means-of-travel-to-work-by-statistical-area-3/
    Explore at:
    mapinfo mif, csv, dbf (dbase iii), geodatabase, mapinfo tab, geopackage / sqliteAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Description

    Dataset shows an individual’s statistical area 3 (SA3) of usual residence and the SA3 of their workplace address, for the employed census usually resident population count aged 15 years and over, by main means of travel to work from the 2018 and 2023 Censuses.

    The main means of travel to work categories are:

    • Work at home
    • Drive a private car, truck, or van
    • Drive a company car, truck, or van
    • Passenger in a car, truck, van, or company bus
    • Public bus
    • Train
    • Bicycle
    • Walk or jog
    • Ferry
    • Other.

    Main means of travel to work is the usual method which an employed person aged 15 years and over used to travel the longest distance to their place of work.

    Workplace address refers to where someone usually works in their main job, that is the job in which they worked the most hours. For people who work at home, this is the same address as their usual residence address. For people who do not work at home, this could be the address of the business they work for or another address, such as a building site.

    Workplace address is coded to the most detailed geography possible from the available information. This dataset only includes travel to work information for individuals whose workplace address is available at SA3 level. The sum of the counts for each region in this dataset may not equal the total employed census usually resident population count aged 15 years and over for that region. Workplace address – 2023 Census: Information by concept has more information.

    This dataset can be used in conjunction with the following spatial files by joining on the SA3 code values:

    Download data table using the instructions in the Koordinates help guide.

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city. 

    Population counts

    Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts. 

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data).

    Workplace address time series

    Workplace address time series data should be interpreted with care at lower geographic levels, such as statistical area 2 (SA2). Methodological improvements in 2023 Census resulted in greater data accuracy, including a greater proportion of people being counted at lower geographic areas compared to the 2018 Census. Workplace address – 2023 Census: Information by concept has more information.

    Working at home

    In the census, working at home captures both remote work, and people whose business is at their home address (e.g. farmers or small business owners operating from their home). The census asks respondents whether they ‘mostly’ work at home or away from home. It does not capture whether someone does both, or how frequently they do one or the other.

    Rows excluded from the dataset

    Rows show SA3 of usual residence by SA3 of workplace address. Rows with a total population count of less than six have been removed to reduce the size of the dataset, given only a small proportion of SA3-SA3 combinations have commuter flows.

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Quality rating of a variable

    The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.

    Main means of travel to work quality rating

    Main means of travel to work is rated as moderate quality.

    Main means of travel to work – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Workplace address quality rating

    Workplace address is rated as moderate quality.

    Workplace address – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Percentages

    To calculate percentages, divide the figure for the category of interest by the figure for ‘Total stated’ where this applies.

    Symbol

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

  15. m

    THVD (Talking Head Video Dataset)

    • data.mendeley.com
    Updated Apr 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Peedor (2025). THVD (Talking Head Video Dataset) [Dataset]. http://doi.org/10.17632/ykhw8r7bfx.2
    Explore at:
    Dataset updated
    Apr 29, 2025
    Authors
    Mario Peedor
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    About

    We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.

    Distribution

    Detailing the format, size, and structure of the dataset: Data Volume: -Total Size: 2.7TB

    -Total Videos: 47,547

    -Identities Covered: 20,841

    -Resolution: 60% 4k(1980), 33% fullHD(1080)

    -Formats: MP4

    -Full-length videos with visible mouth movements in every frame.

    -Minimum face size of 400 pixels.

    -Video durations range from 20 seconds to 5 minutes.

    -Faces have not been cut out, full screen videos including backgrounds.

    Usage

    This dataset is ideal for a variety of applications:

    Face Recognition & Verification: Training and benchmarking facial recognition models.

    Action Recognition: Identifying human activities and behaviors.

    Re-Identification (Re-ID): Tracking identities across different videos and environments.

    Deepfake Detection: Developing methods to detect manipulated videos.

    Generative AI: Training high-resolution video generation models.

    Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.

    Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.

    Coverage

    Explaining the scope and coverage of the dataset:

    Geographic Coverage: Worldwide

    Time Range: Time range and size of the videos have been noted in the CSV file.

    Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.

    Languages Covered (Videos):

    English: 23,038 videos

    Portuguese: 1,346 videos

    Spanish: 677 videos

    Norwegian: 1,266 videos

    Swedish: 1,056 videos

    Korean: 848 videos

    Polish: 1,807 videos

    Indonesian: 1,163 videos

    French: 1,102 videos

    German: 1,276 videos

    Japanese: 1,433 videos

    Dutch: 1,666 videos

    Indian: 1,163 videos

    Czech: 590 videos

    Chinese: 685 videos

    Italian: 975 videos

    Philipeans: 920 videos

    Bulgaria: 340 videos

    Romanian: 1144 videos

    Arabic: 1691 videos

    Who Can Use It

    List examples of intended users and their use cases:

    Data Scientists: Training machine learning models for video-based AI applications.

    Researchers: Studying human behavior, facial analysis, or video AI advancements.

    Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.

    Additional Notes

    Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, down sampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 100GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file.

  16. 4

    Data from: Dataset: Floating plastic transport and accumulation in Amsterdam...

    • data.4tu.nl
    zip
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paolo Tasseron (2025). Dataset: Floating plastic transport and accumulation in Amsterdam 2022-2023 [Dataset]. http://doi.org/10.4121/77d424f5-fad5-44a8-91d6-01a29d161784.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    4TU.ResearchData
    Authors
    Paolo Tasseron
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 2022 - Oct 2023
    Area covered
    The Netherlands, Amsterdam
    Description

    This dataset contains data from monthly plastic monitoring efforts at twenty locations in Amsterdam between November 2022 and October 2023. The ArcGIS Survey123 application (ESRI, Redlands, California) with a customized form was used to efficiently log the observed floating plastic items in digital forms. Items at all locations were categorized using the River-OSPAR protocol, originally developed by the North Sea Foundation [1]. Two types of monitoring took place.


    First, accumulated items were monitored at eleven locations, counting and categorizing all floating items within the most polluted 100 square meters of surface area. This area can be variable depending on the conditions during the measurement day. Additionally, the surface area does not follow any geometry rules, i.e. it can have any shape or form. We determined the monitored area by drawing polygons in Google Earth and redefined this area to capture the most polluted surface when deviating from the original surface area (Example: Figure 1a and b).


    Second, the transport of items was monitored at nine locations, using the visual counting method to estimate plastic transport [2]. The observer counts and categorizes all floating items for a predetermined time interval and observation width on top of a bridge. Each bridge was divided into 1 to 3 segments, depending on the length of the bridge. Segments can be unequal in width by covering the distance between bridge pillars (Example: Figure 1c). Consequently, each segment covers a part of the waterway within the field of view of the observer, enabling the identification of all floating items within a given segment. Here, every segment was measured four times for five minutes every monitoring round.


    The dataset contains one main file, 'dataset.xlsx' with six sheets:


    1) Locations + metadata | Description of the monitored locations with identifiers, street names, type of monitoring and lat/lon coordinates


    2) Transport (flux) data | All entries of monitored items for transport locations.


    3) Accumulation (density) data | All entries of monitored items for accumulation locations.


    4) Totals and top ten items | Total counts per item category, an overview of the categories and share of top ten items both summed for all locations and per location.

    5) Transport calculations into IJ | Stepwise calculations from raw data to mass transport estimates in [kg/yr] into the IJ river. A short description of all calculation steps is included in this sheet.


    6) Mass calculations for accumulation and transport, for each item category with all locations combined.


    For sheet 2 and 3, all timestamps are in UTC. To convert to local time, the purple timestamps are UTC+1 (CET), and the green timestamps UTC+2 (CEST).


    The dataset contains one figure, providing an example of monitoring. Figure caption: a) Top view of `accumulation' monitoring area at location NW2, (b) side-view with the same monitoring area, (c) top view of `transport' monitoring at location O1B. This bridge is divided into three segments (S1, S2, S3) which can be monitored by one or multiple observers simultaneously.


  17. FOI-01853 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nhsbsa.net (2024). FOI-01853 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-01853
    Explore at:
    Dataset updated
    May 3, 2024
    Dataset provided by
    NHS Business Services Authority
    Description

    Under the Freedom of Information Act 2000, I request the following information: The number of individuals of all ages who were prescribed contraceptives in the financial years 2019-2020, 2021-2020, 2020-2021, 2021-2022 and 2022-2023 in community settings (GP surgeries and pharmacies) broken down by contraceptive method. I would also like the proportion these represent of contraception users. For example, X proportion of those on contraception are using the Mirena coil. If possible, I would also appreciate if this were broken down by age of those prescriptions too. To clarify, I mean patients. I also mean both contraceptive drugs and appliances/devices Response A copy of the information is attached. Please read the following information to ensure correct understanding of the data. Fewer than five Please be aware that I have decided not to release the full details where the total number of individuals falls below five. This is because the individuals could be identified, when combined with other information that may be in the public domain or reasonably available. This information falls under the exemption in section 40 subsections 2 and 3 (a) of the Freedom of Information Act (FOIA). This is because it would breach the first data protection principle as: a - It is not fair to disclose individual’s personal details to the world and is likely to cause damage or distress. b - These details are not of sufficient interest to the public to warrant an intrusion into the privacy of the individual. Please click the weblink to see the exemption in full: www.legislation.gov.uk/ukpga/2000/36/section/40 NHS Business Services Authority (NHSBSA) - NHS Prescription Services process prescriptions for Pharmacy Contractors, Appliance Contractors, Dispensing Doctors, and Personal Administration with information then used to make payments to pharmacists and appliance contractors in England for prescriptions dispensed in primary care settings (other arrangements are in place for making payments to Dispensing Doctors and Personal Administration). This involves processing over one billion prescription items and payments totalling over £9 billion each year. The information gathered from this process is then used to provide information on costs and trends in prescribing in England and Wales to over 25,000 registered NHS and Department of Health and Social Care (DHSC) users. Data Source: ePACT2 - Data in ePACT2 is sourced from the NHSBSA Data Warehouse and is derived from products prescribed on prescriptions and dispensed in the Community. The data captured from prescription processing is used to calculate reimbursement and remuneration. It includes items prescribed in England, Wales, Scotland, Northern Ireland, Guernsey/Alderney, Jersey, and Isle of Man which have been dispensed in the community in England. English prescribing that has been dispensed in Wales, Scotland, Guernsey/Alderney, Jersey, and Isle of Man is also included. The data excludes: • Items not dispensed, disallowed and those returned to the contractor for further clarification. • Prescriptions prescribed and dispensed in prisons, hospitals, and private prescriptions. • Items prescribed but not presented for dispensing or not submitted to NHS Prescription Services by the dispenser. Dataset - The data is limited to presentations prescribed in BNF sections 0703 Contraceptives and BNF section 2104 Contraceptive Devices. Data is presented at BNF Sub Paragraph and BNF Presentation level. Time Period - Financial years 2019/20, 2020/21, 2021/22, 2022/23 and 2023/24 (April 2023 - January 2024). Data is currently available up to and including January 2024. Organisation Data - The data is for prescribing in England regardless of where dispensed in the community. British National Formulary (BNF) Sub Paragraph and Presentation Code – The BNF Code is a 15-digit code in which the first seven digits are allocated according to the categories in the BNF, and the last eight digits represent the medicinal product, form, strength and the link to the generic equivalent product. NHS Prescription Services has created pseudo BNF chapters, which are not published, for items not included in BNF chapters 1 to 15. Most of such items are dressings and appliances which NHS Prescription Services has classified into four pseudo BNF chapters (20 to 23). Patient Identification - Where patient identifiable figures have been reported they are based on the information captured during the prescription processing activities. Please note, patient details cannot be captured from every prescription form and based on the criteria used for this analysis, patient information (NHS number) was only available for 98.28% of prescription items. The unique patient count figures are based on a distinct count of NHS number as captured from the prescription image. Patient ages are based on the age as captured from the prescription image and relates to the patient's age at the time of prescribing/dispensing. Please note it is possible that a single patient may be included in the results for more than one age band where a patient has received prescribing at different ages during a financial year. The figures for the number of identifiable patients should not be combined and reported at any other level than provided as this may result in the double counting of patients. For example, a single patient could appear in the results for multiple presentations or both financial years. Patient Age - Shows the age of the patient, if recorded. Data Quality for patient age - NHSBSA stores information on the age of the recipient of each prescription as it was read by computer from images of paper prescriptions or as attached to messages sent through the electronic prescription system. The NHSBSA does not validate, verify or manually check the resulting information as part of the routine prescription processing. There are some data quality issues with the ages of patients prescribed the products. The NHSBSA holds prescription images for 18 months. A sample of the data was compared to the images of the paper prescription forms from which the data was generated where these images are still available. These checks revealed issues in the reliability of age data, in particular the quality of the stored age data was poor for patients recorded as aged two years and under. When considering the accuracy of age data, it is expected that a small number of prescriptions may be allocated against any given patient age incorrectly. Application of Disclosure Control to information services (prescriptions) products- ePACT 2 data is not published statistics - it is available to authorised NHS users who are subject to Caldicott Guardian approval. We have no plans to apply disclosure control to data released to ePACT 2 users. These users are under an obligation to protect the anonymity of any patients when reusing this data or releasing derived information publicly. All requests that fall under the FOI process are subject to the NHSBSA Anonymisation and Pseudonymisation Standard. The application of the techniques described in the standard is judged on a case-by-case basis (by NHSBSA Information Governance) in respect of what techniques should be applied. The ICO typically rules on a case-by-case basis too so each case or challenge or appeal is judged on its own merits. FOI rules apply to data that we hold as part of our normal course of business.

  18. 2023 Census totals by topic for families and extended families by...

    • datafinder.stats.govt.nz
    csv, dwg, geodatabase +6
    Updated Nov 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2024). 2023 Census totals by topic for families and extended families by statistical area 2 [Dataset]. https://datafinder.stats.govt.nz/layer/120891-2023-census-totals-by-topic-for-families-and-extended-families-by-statistical-area-2/
    Explore at:
    mapinfo tab, geopackage / sqlite, shapefile, kml, csv, geodatabase, pdf, mapinfo mif, dwgAvailable download formats
    Dataset updated
    Nov 24, 2024
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Area covered
    Description

    Dataset contains counts and measures for families and extended families from the 2013, 2018, and 2023 Censuses. Data is available by statistical area 2.

    The variables included in this dataset are for families and extended families in households in occupied private dwellings:

    • Count of families
    • Family type
    • Number of people in family
    • Average number of people in family
    • Total family income
    • Median ($) total family income
    • Count of extended families
    • Extended family type
    • Total extended family income
    • Median ($) total extended family income.

    Download lookup file from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Concept descriptions and quality ratings

    Data quality ratings for 2023 Census variables has additional details about variables found within totals by topic, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Measures

    Measures like averages, medians, and other quantiles are calculated from unrounded counts, with input noise added to or subtracted from each contributing value during measures calculations. Averages and medians based on less than six units (e.g. individuals, dwellings, households, families, or extended families) are suppressed. This suppression threshold changes for other quantiles. Where the cells have been suppressed, a placeholder value has been used.

    Percentages

    To calculate percentages, divide the figure for the category of interest by the figure for 'Total stated' where this applies.

    Symbol

    -997 Not available

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

  19. Students Test Data

    • kaggle.com
    Updated Sep 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ATHARV BHARASKAR (2023). Students Test Data [Dataset]. https://www.kaggle.com/datasets/atharvbharaskar/students-test-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ATHARV BHARASKAR
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Dataset Overview: This dataset pertains to the examination results of students who participated in a series of academic assessments at a fictitious educational institution named "University of Exampleville." The assessments were administered across various courses and academic levels, with a focus on evaluating students' performance in general management and domain-specific topics.

    Columns: The dataset comprises 12 columns, each representing specific attributes and performance indicators of the students. These columns encompass information such as the students' names (which have been anonymized), their respective universities, academic program names (including BBA and MBA), specializations, the semester of the assessment, the type of examination domain (general management or domain-specific), general management scores (out of 50), domain-specific scores (out of 50), total scores (out of 100), student ranks, and percentiles.

    Data Collection: The examination data was collected during a standardized assessment process conducted by the University of Exampleville. The exams were designed to assess students' knowledge and skills in general management and their chosen domain-specific subjects. It involved students from both BBA and MBA programs who were in their final year of study.

    Data Format: The dataset is available in a structured format, typically as a CSV file. Each row represents a unique student's performance in the examination, while columns contain specific information about their results and academic details.

    Data Usage: This dataset is valuable for analyzing and gaining insights into the academic performance of students pursuing BBA and MBA degrees. It can be used for various purposes, including statistical analysis, performance trend identification, program assessment, and comparison of scores across domains and specializations. Furthermore, it can be employed in predictive modeling or decision-making related to curriculum development and student support.

    Data Quality: The dataset has undergone preprocessing and anonymization to protect the privacy of individual students. Nevertheless, it is essential to use the data responsibly and in compliance with relevant data protection regulations when conducting any analysis or research.

    Data Format: The exam data is typically provided in a structured format, commonly as a CSV (Comma-Separated Values) file. Each row in the dataset represents a unique student's examination performance, and each column contains specific attributes and scores related to the examination. The CSV format allows for easy import and analysis using various data analysis tools and programming languages like Python, R, or spreadsheet software like Microsoft Excel.

    Here's a column-wise description of the dataset:

    Name OF THE STUDENT: The full name of the student who took the exam. (Anonymized)

    UNIVERSITY: The university where the student is enrolled.

    PROGRAM NAME: The name of the academic program in which the student is enrolled (BBA or MBA).

    Specialization: If applicable, the specific area of specialization or major that the student has chosen within their program.

    Semester: The semester or academic term in which the student took the exam.

    Domain: Indicates whether the exam was divided into two parts: general management and domain-specific.

    GENERAL MANAGEMENT SCORE (OUT of 50): The score obtained by the student in the general management part of the exam, out of a maximum possible score of 50.

    Domain-Specific Score (Out of 50): The score obtained by the student in the domain-specific part of the exam, also out of a maximum possible score of 50.

    TOTAL SCORE (OUT of 100): The total score obtained by adding the scores from the general management and domain-specific parts, out of a maximum possible score of 100.

  20. S

    NASICON-type solid electrolyte materials named entity recognition dataset

    • scidb.cn
    Updated Apr 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi (2023). NASICON-type solid electrolyte materials named entity recognition dataset [Dataset]. http://doi.org/10.57760/sciencedb.j00213.00001
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi
    Description

    1.Framework overview. This paper proposed a pipeline to construct high-quality datasets for text mining in materials science. Firstly, we utilize the traceable automatic acquisition scheme of literature to ensure the traceability of textual data. Then, a data processing method driven by downstream tasks is performed to generate high-quality pre-annotated corpora conditioned on the characteristics of materials texts. On this basis, we define a general annotation scheme derived from materials science tetrahedron to complete high-quality annotation. Finally, a conditional data augmentation model incorporating materials domain knowledge (cDA-DK) is constructed to augment the data quantity.2.Dataset information. The experimental datasets used in this paper include: the Matscholar dataset publicly published by Weston et al. (DOI: 10.1021/acs.jcim.9b00470), and the NASICON entity recognition dataset constructed by ourselves. Herein, we mainly introduce the details of NASICON entity recognition dataset.2.1 Data collection and preprocessing. Firstly, 55 materials science literature related to NASICON system are collected through Crystallographic Information File (CIF), which contains a wealth of structure-activity relationship information. Note that materials science literature is mostly stored as portable document format (PDF), with content arranged in columns and mixed with tables, images, and formulas, which significantly compromises the readability of the text sequence. To tackle this issue, we employ the text parser PDFMiner (a Python toolkit) to standardize, segment, and parse the original documents, thereby converting PDF literature into plain text. In this process, the entire textual information of literature, encompassing title, author, abstract, keywords, institution, publisher, and publication year, is retained and stored as a unified TXT document. Subsequently, we apply rules based on Python regular expressions to remove redundant information, such as garbled characters and line breaks caused by figures, tables, and formulas. This results in a cleaner text corpus, enhancing its readability and enabling more efficient data analysis. Note that special symbols may also appear as garbled characters, but we refrain from directly deleting them, as they may contain valuable information such as chemical units. Therefore, we converted all such symbols to a special token

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). AirNow Air Quality Monitoring Data (Current) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/airnow-air-quality-monitoring-data-current

AirNow Air Quality Monitoring Data (Current) - Dataset - CKAN

Explore at:
Dataset updated
Feb 28, 2024
Description

This United States Environmental Protection Agency (US EPA) feature layer represents monitoring site data, updated hourly concentrations and Air Quality Index (AQI) values for the latest hour received from monitoring sites that report to AirNow.Map and forecast data are collected using federal reference or equivalent monitoring techniques or techniques approved by the state, local or tribal monitoring agencies. To maintain "real-time" maps, the data are displayed after the end of each hour. Although preliminary data quality assessments are performed, the data in AirNow are not fully verified and validated through the quality assurance procedures monitoring organizations used to officially submit and certify data on the EPA Air Quality System (AQS).This data sharing, and centralization creates a one-stop source for real-time and forecast air quality data. The benefits include quality control, national reporting consistency, access to automated mapping methods, and data distribution to the public and other data systems. The U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, National Park Service, tribal, state, and local agencies developed the AirNow system to provide the public with easy access to national air quality information. State and local agencies report the Air Quality Index (AQI) for cities across the US and parts of Canada and Mexico. AirNow data are used only to report the AQI, not to formulate or support regulation, guidance or any other EPA decision or position.About the AQIThe Air Quality Index (AQI) is an index for reporting daily air quality. It tells you how clean or polluted your air is, and what associated health effects might be a concern for you. The AQI focuses on health effects you may experience within a few hours or days after breathing polluted air. EPA calculates the AQI for five major air pollutants regulated by the Clean Air Act: ground-level ozone, particle pollution (also known as particulate matter), carbon monoxide, sulfur dioxide, and nitrogen dioxide. For each of these pollutants, EPA has established national air quality standards to protect public health. Ground-level ozone and airborne particles (often referred to as "particulate matter") are the two pollutants that pose the greatest threat to human health in this country.A number of factors influence ozone formation, including emissions from cars, trucks, buses, power plants, and industries, along with weather conditions. Weather is especially favorable for ozone formation when it’s hot, dry and sunny, and winds are calm and light. Federal and state regulations, including regulations for power plants, vehicles and fuels, are helping reduce ozone pollution nationwide.Fine particle pollution (or "particulate matter") can be emitted directly from cars, trucks, buses, power plants and industries, along with wildfires and woodstoves. But it also forms from chemical reactions of other pollutants in the air. Particle pollution can be high at different times of year, depending on where you live. In some areas, for example, colder winters can lead to increased particle pollution emissions from woodstove use, and stagnant weather conditions with calm and light winds can trap PM2.5 pollution near emission sources. Federal and state rules are helping reduce fine particle pollution, including clean diesel rules for vehicles and fuels, and rules to reduce pollution from power plants, industries, locomotives, and marine vessels, among others.How Does the AQI Work?Think of the AQI as a yardstick that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 represents good air quality with little potential to affect public health, while an AQI value over 300 represents hazardous air quality.An AQI value of 100 generally corresponds to the national air quality standard for the pollutant, which is the level EPA has set to protect public health. AQI values below 100 are generally thought of as satisfactory. When AQI values are above 100, air quality is considered to be unhealthy-at first for certain sensitive groups of people, then for everyone as AQI values get higher.Understanding the AQIThe purpose of the AQI is to help you understand what local air quality means to your health. To make it easier to understand, the AQI is divided into six categories:Air Quality Index(AQI) ValuesLevels of Health ConcernColorsWhen the AQI is in this range:..air quality conditions are:...as symbolized by this color:0 to 50GoodGreen51 to 100ModerateYellow101 to 150Unhealthy for Sensitive GroupsOrange151 to 200UnhealthyRed201 to 300Very UnhealthyPurple301 to 500HazardousMaroonNote: Values above 500 are considered Beyond the AQI. Follow recommendations for the Hazardous category. Additional information on reducing exposure to extremely high levels of particle pollution is available here.Each category corresponds to a different level of health concern. The six levels of health concern and what they mean are:"Good" AQI is 0 to 50. Air quality is considered satisfactory, and air pollution poses little or no risk."Moderate" AQI is 51 to 100. Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people. For example, people who are unusually sensitive to ozone may experience respiratory symptoms."Unhealthy for Sensitive Groups" AQI is 101 to 150. Although general public is not likely to be affected at this AQI range, people with lung disease, older adults and children are at a greater risk from exposure to ozone, whereas persons with heart and lung disease, older adults and children are at greater risk from the presence of particles in the air."Unhealthy" AQI is 151 to 200. Everyone may begin to experience some adverse health effects, and members of the sensitive groups may experience more serious effects."Very Unhealthy" AQI is 201 to 300. This would trigger a health alert signifying that everyone may experience more serious health effects."Hazardous" AQI greater than 300. This would trigger a health warnings of emergency conditions. The entire population is more likely to be affected.AQI colorsEPA has assigned a specific color to each AQI category to make it easier for people to understand quickly whether air pollution is reaching unhealthy levels in their communities. For example, the color orange means that conditions are "unhealthy for sensitive groups," while red means that conditions may be "unhealthy for everyone," and so on.Air Quality Index Levels of Health ConcernNumericalValueMeaningGood0 to 50Air quality is considered satisfactory, and air pollution poses little or no risk.Moderate51 to 100Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.Unhealthy for Sensitive Groups101 to 150Members of sensitive groups may experience health effects. The general public is not likely to be affected.Unhealthy151 to 200Everyone may begin to experience health effects; members of sensitive groups may experience more serious health effects.Very Unhealthy201 to 300Health alert: everyone may experience more serious health effects.Hazardous301 to 500Health warnings of emergency conditions. The entire population is more likely to be affected.Note: Values above 500 are considered Beyond the AQI. Follow recommendations for the "Hazardous category." Additional information on reducing exposure to extremely high levels of particle pollution is available here.

Search
Clear search
Close search
Google apps
Main menu