10 datasets found
  1. f

    S1 File -

    • plos.figshare.com
    application/x-gzip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo J. Aguilar; Valmir C. Barbosa (2023). S1 File - [Dataset]. http://doi.org/10.1371/journal.pone.0286312.s001
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Eduardo J. Aguilar; Valmir C. Barbosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called “midrange” distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.

  2. CFRAM Coastal Flood Extents - Mid-Range Future Scenario - Dataset -...

    • data.gov.ie
    Updated Mar 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.ie (2025). CFRAM Coastal Flood Extents - Mid-Range Future Scenario - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/cfram-coastal-flood-extents-mid-range-future-scenario
    Explore at:
    Dataset updated
    Mar 1, 2025
    Dataset provided by
    data.gov.ie
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Abstract: This data shows the modelled extent of land that might be flooded by the sea (coastal flooding) during a theoretical or ‘design’ flood event with an estimated probability of occurrence, rather than information for actual floods that have occurred in the past. The extents have been developed taking account of effective flood defences. Flood event probabilities are referred to in terms of a percentage Annual Exceedance Probability, or ‘AEP’. This represents the probability of an event of this, or greater, severity occurring in any given year. These probabilities may also be expressed as odds (e.g. 100 to 1) of the event occurring in any given year. They are also commonly referred to in terms of a return period (e.g. the 100-year flood), although this period is not the length of time that will elapse between two such events occurring, as, although unlikely, two very severe events may occur within a short space of time. The following sets out a range of flood event probabilities for which fluvial and coastal flood maps are typically developed, expressed in terms of Annual Exceedance Probability (AEP), and identifies their parallels under other forms of expression: 10% (High Probability) Annual Exceedance Probability which can also be expressed as the 10 Year Return Period and as a 10:1 odds of occurrence in any given year. 1% (Medium Probability - Fluvial/River Flood Maps) Annual Exceedance Probability which can also be expressed as the 100 Year Return Period and as 100:1 odds of occurrence in any given year. 0.5% (Medium Probability - Coastal Flood Maps) Annual Exceedance Probability which can also be expressed as the 200 Year Return Period and as 200:1 odds of occurrence in any given year. 0.1% (Low Probability) Annual Exceedance Probability which can also be expressed as the 1000 Year Return Period and as 1000:1 odds of occurrence in any given year. The Mid-Range Future Scenario extents where generated taking in in the potential effects of climate change using an increase in rainfall of 20% and sea level rise of 500mm (20 inches). Data has been produced for the 'Areas of Further Assessment' (AFAs), as required by the EU 'Floods' Directive [2007/60/EC] and designated under the Preliminary Flood Risk Assessment, and also for other reaches between the AFAs and down to the sea that are referred to as 'Medium Priority Watercourses' (MPWs). River reaches that have been modelled are indicated by the CFRAM Modelled River Centrelines dataset. Flooding from other reaches of river may occur, but has not been mapped, and so areas that are not shown as being within a flood extent may therefore be at risk of flooding from unmodelled rivers (as well as from other sources). The purpose of the Flood Maps is not to designate individual properties at risk of flooding. They are community-based maps. Lineage: Fluvial and coastal flood map data is developed using hydrodynamic modelling, based on calculated design river flows and extreme sea levels, surveyed channel cross-sections, in-bank / bank-side / coastal structures, Digital Terrain Models, and other relevant datasets (e.g. land use, data on past floods for model calibration, etc.). The process may vary for particular areas or maps. Technical Hydrology and Hydraulics Reports set out full technical details on the derivation of the flood maps. For coastal flood levels, the accuracy of the predicted annual exceedance probability (AEP) of combined tide and surge levels depends on the accuracy of the various components used in deriving these levels i.e. accuracy of the tidal and surge model, the accuracy of the statistical data and the accuracy for the conversion from marine datum to land levelling datum. The output of the water level modelling, combined with the extreme value analysis undertaken as detailed above is generally within +/-0.2m for confidence limits of 95% at the 0.1% AEP. Higher probability (lower return period) events are expected to have tighter confidence limits. Flood levels, depths and velocities are derived from the hydrodynamic models for the various event probabilities and scenarios. Flood extents are derived from the raster flood depth maps and vectorised to produce the final vector outputs. v101 (March 2025) The section of map near Oranmore Galway updated following a map review process see https://www.floodinfo.ie/map-review/ for further information, Map Review Code: MR019. Purpose: The data has been developed to comply with the requirements of the European Communities (Assessment and Management of Flood Risks) Regulations 2010 to 2015 (the “Regulations”) (implementing Directive 2007/60/EC) for the purposes of establishing a framework for the assessment and management of flood risks, aiming at the reduction of adverse consequences for human health, the environment, cultural heritage and economic activity associated with floods. .hidden { display: none }

  3. National Indicative Fluvial Mapping (NIFM) River Flood Depth - Mid-Range...

    • data.gov.ie
    Updated Dec 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.ie (2020). National Indicative Fluvial Mapping (NIFM) River Flood Depth - Mid-Range Future Scenario - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/nifm-river-flood-depth-mid-range-future-scenario
    Explore at:
    Dataset updated
    Dec 29, 2020
    Dataset provided by
    data.gov.ie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    5% Annual Exceedance Probability which can also be expressed as the 20 Year Return Period and as 20:1 odds of occurrence in any given year. 1% (Medium Probability) Annual Exceedance Probability which can also be expressed as the 100 Year Return Period and as 100:1 odds of occurrence in any given year. 0.1% (Low Probability) Annual Exceedance Probability which can also be expressed as the 1000 Year Return Period and as 1000:1 odds of occurrence in any given year. The Mid-Range Future Scenario extents where generated taking in the potential effects of climate change using an increase in rainfall of 20%. Data has been produced for catchments greater than 5km2 in areas for which flood maps were not produced under the National CFRAM Programme and should be read in this context. River reaches that have been modelled are indicated by the NIFM Modelled River Centrelines dataset. Flooding from other reaches of river may occur, but has not been mapped, and so areas that are not shown as being within a flood extent may therefore be at risk of flooding from unmodelled rivers (as well as from other sources). The purpose of the Flood Maps is not to designate individual properties or point locations at risk of flooding, or to replace a detailed site-specific flood risk assessment. Lineage: The indicative fluvial flood maps were developed using hydrodynamic modelling, based on calculated design river flows, Digital Terrain Models, and other relevant datasets (e.g. land use, data on past floods, etc.). The process may vary for particular areas or maps. The National Indicative Fluvial Maps provide an indication of areas that may flood during a flood of an estimated probability of occurring. As detailed in the Technical Data, a number of assumptions have been made in order to produce a dataset suitable for national level flood risk assessments. The National Indicative Fluvial Maps are not the best achievable representation of flood extents and they are not as accurate as the Flood Maps produced under the National Catchment Flood Risk Assessment and Management (CFRAM) Programme. The maps should not be used to assess the flood risk associated with individual properties or point locations, or to replace a detailed site-specific flood risk assessment. Flood levels and depths are derived from the hydrodynamic models for the various event probabilities and scenarios. Flood extents are derived from the raster flood depth maps and vectorised to produce the final vector outputs. Purpose: The data has been developed to inform a national assessment of flood risk that in turn will inform a review of the Preliminary Flood Risk Assessment required to comply with the requirements of the European Communities (Assessment and Management of Flood Risks) Regulations 2010 to 2015 (the “Regulations”) (implementing Directive 2007/60/EC) for the purposes of establishing a framework for the assessment and management of flood risks, aiming at the reduction of adverse consequences for human health, the environment, cultural heritage and economic activity associated with floods.

  4. f

    Performance of k-means, according to ARIfnc and AMImax, on various scaled...

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo J. Aguilar; Valmir C. Barbosa (2023). Performance of k-means, according to ARIfnc and AMImax, on various scaled versions of the data sets in Table 1. [Dataset]. http://doi.org/10.1371/journal.pone.0286312.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Eduardo J. Aguilar; Valmir C. Barbosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of k-means, according to ARIfnc and AMImax, on various scaled versions of the data sets in Table 1.

  5. National Coastal Flood Depths 2021 - Mid-Range Future Scenario - Dataset -...

    • data.gov.ie
    Updated May 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.ie (2021). National Coastal Flood Depths 2021 - Mid-Range Future Scenario - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/national-coastal-flood-depths-2021-mid-range-future-scenario
    Explore at:
    Dataset updated
    May 27, 2021
    Dataset provided by
    data.gov.ie
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Abstract: This data shows the extent of land that might be flooded by the sea (coastal flooding) and the associated flood depths during a theoretical or ‘design’ flood event with an estimated probability of occurrence, rather than information for actual floods that have occurred in the past. This represents the worst case scenario as any flood defences potentially protecting the coastal floodplain are not taken into account. Flood event probabilities are referred to in terms of a percentage Annual Exceedance Probability, or ‘AEP’. This represents the probability of an event of this, or greater, severity occurring in any given year. These probabilities may also be expressed as the chance or odds (e.g. 200 to 1) of the event occurring in any given year. They are also commonly referred to in terms of a return period (e.g. the 200-year flood), although this period is not the length of time that will elapse between two such events occurring, as, although unlikely, two very severe events may occur within a short space of time. The following sets out the range of flood event probabilities for which coastal flood extent maps were developed, expressed in terms of Annual Exceedance Probability (AEP), and identifies their parallels under other forms of expression. 50% AEP can also be expressed as the 2 Year Return Period and as the 2:1 odds of occurrence in any given year. 20% AEP can also be expressed as the 5 Year Return Period and as the 5:1 odds of occurrence in any given year. 10% AEP can also be expressed as the 10 Year Return Period and as the 10:1 odds of occurrence in any given year. 5% AEP can also be expressed as the 20 Year Return Period and as the 20:1 odds of occurrence in any given year. 2% AEP can also be expressed as the 50 Year Return Period and as the 50:1 odds of occurrence in any given year. 1% AEP can also be expressed as the 100 Year Return Period and as the 100:1 odds of occurrence in any given year. 0.5% AEP can also be expressed as the 200 Year Return Period and as the 200:1 odds of occurrence in any given year. 0.1% AEP can also be expressed as the 1000 Year Return Period and as the 1000:1 odds of occurrence in any given year. The Mid-Range Future Scenario (MRFS) maps represent a projected future scenario for the end of century (circa 2100) and include allowances for projected future changes in sea levels and glacial isostatic adjustment (GIA). The maps include an increase of 500mm in sea levels above the current scenario estimations. An allowance of -0.5mm/year for GIA was included for the southern part of the national coastline only (Dublin to Galway and south of this). Flooding from other sources may occur and areas that are not shown as being within a flood extent may therefore be at risk of flooding from other sources. The flood extent and depth maps are suitable for the assessment of flood risk at a strategic scale only, and should not be used to assess the flood hazard and risk associated with individual properties or point locations, or to replace a detailed flood risk assessment. Lineage: The National Coastal Flood Hazard Maps (NCFHM) 2021 are ‘predictive’ flood maps, as they provide predicted flood extent and depth information for a ‘design’ flood event that has an estimated probability of occurrence (e.g. the 0.5% AEP event), rather than information for floods that have occurred in the past. The maps have been produced at a strategic level to provide an overview of coastal flood hazard in Ireland, and minor or local features may not have been included in their preparation. A Digital Terrain Model (DTM) was used to generate the maps, which is a ‘bare-earth’ model of the ground surface with the digital removal of human-made and natural landscape features such as vegetation, buildings and bridges. This methodology can result in some of these human-made features, such as bridges and embankments, being shown within a flood extent, when in reality they do not flood. It should be noted that the flood extent maps indicate the predicted maximum extent of flooding, and flooding in some areas, such as near the edge of the floodplain area, might be very shallow. The predicted depth of flooding at a given location is indicated on the flood depth maps. The flood depth is displayed as a constant depth over grid squares with a 5m resolution, whereas in reality depths may vary within a given grid square. No post-processing of the flood extent and depth map datasets has been undertaken to remove small areas of flooding that are remote and isolated, small islands within the flooded area, etc. Local factors such as flood defence schemes, structures in or around river channels (e.g. bridges), buildings and other local influences, which might affect coastal flooding, have not been accounted for. Detailed explanations of the methods of derivation, data used, etc. is provided in the NCFHM 2021 Flood Mapping Methodology Report. Users of the maps should familiarise themselves fully with the contents of this report in advance of the use of the maps. Purpose: The data has been developed to inform a national assessment of flood risk that in turn will inform a review of the Preliminary Flood Risk Assessment required to comply with the requirements of the European Communities (Assessment and Management of Flood Risks) Regulations 2010 to 2015 (the “Regulations”) (implementing Directive 2007/60/EC) for the purposes of establishing a framework for the assessment and management of flood risks, aiming at the reduction of adverse consequences for human health, the environment, cultural heritage and economic activity associated with floods.

  6. a

    Chehalis Basin 100-year Floodplain - Mid-range Climate Change Projection

    • data-wutc.opendata.arcgis.com
    Updated May 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Washington State Department of Ecology (2021). Chehalis Basin 100-year Floodplain - Mid-range Climate Change Projection [Dataset]. https://data-wutc.opendata.arcgis.com/datasets/waecy::chehalis-basin-100-year-floodplain-mid-range-climate-change-projection
    Explore at:
    Dataset updated
    May 27, 2021
    Dataset authored and provided by
    Washington State Department of Ecology
    Area covered
    Description

    These 100-year floodplain boundaries are intended to provide information about future inundation extents throughout the Chehalis River basin based on mid-range climate change projections. These data are not officially endorsed by Federal or Washington State agencies, and are intended for planning use only.This shapefile contains 100-year floodplain boundaries generated from a combination of hydraulic model results and topographic data geoprocessing. The boundaries were derived using the University of Washington Climate Impact Group's estimate of the moderate range of anticipated change to rainfall in 2080 - a 26% increase in precipitation - using down-scaled climate model data for the Chehalis Basin region.Details of the methods used to derive these floodplain extents are documented in a technical memorandum available from Watershed Science and Engineering (2021).

  7. Data from: Indoor and ambient air pollution dataset using a multi-instrument...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Indoor and ambient air pollution dataset using a multi-instrument approach and total event monitoring [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-14697454?locale=da
    Explore at:
    unknown(21853)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises 19 subsets. Each subset measures a different parameter or is produced by a different sensor provider. The measurement period for this dataset was from October 11, 2024, to October 31, 2024, and the measurement interval depends on the type of parameter being measured, ranging from 1 second to 15 minutes. The dataset includes six indoor low-cost sensor providers with their respective measuring sensors. Three of these providers had only one sensor at the location, while one had 16 sensors, and the other two had 4 and 2 sensors, respectively. Human presence was monitored using a camera and a motion detection sensor. Window and door opening and closing were monitored using Xiaomi Door/Window sensors. In addition to the indoor low-cost sensors, the location was equipped with reference sensing units that were calibrated to the measuring station. Furthermore, outdoor low-cost sensors were also used. Specifically, one was a low-cost sensor, and the other was a mid-range sensor in terms of pricing. This dataset also includes black carbon data and CPC data. A camera was set up on the balcony to monitor the road in front of the house, so traffic data is also included in the dataset. Additionally, on-site measuring data from the Croatian Meteorological and Hydrological Service was made available in this dataset, sourced from the two nearest locations to the measuring site, as well as satellite data from the Climate Data Store. Every single parameter is detailed in the Data.xlsx file, which is integrated into the data.zip archive.

  8. National Indicative Fluvial Mapping (NIFM) River Flood Extents - Mid-Range...

    • data.gov.ie
    Updated Apr 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.ie (2021). National Indicative Fluvial Mapping (NIFM) River Flood Extents - Mid-Range Future Scenario [Dataset]. https://data.gov.ie/dataset/nifm-river-flood-extents-mid-range-future-scenario
    Explore at:
    Dataset updated
    Apr 29, 2021
    Dataset provided by
    data.gov.ie
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Abstract: This data shows the modelled extent of land that might be flooded by rivers (fluvial flooding) during a theoretical or ‘design’ flood event with an estimated probability of occurrence, rather than information for actual floods that have occurred in the past.

  9. f

    Scaling factors used in Figs 3 (Iris) and 4 (BNA-DR3).

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo J. Aguilar; Valmir C. Barbosa (2023). Scaling factors used in Figs 3 (Iris) and 4 (BNA-DR3). [Dataset]. http://doi.org/10.1371/journal.pone.0286312.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Eduardo J. Aguilar; Valmir C. Barbosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The αk’s are the ones leading to the highest values of ARIfnc in the intervals on the rightmost column of Table 2.

  10. e

    Socio-economic data on grid level (SUF 7.1). Car segments Sozioökonomische...

    • b2find.eudat.eu
    Updated Jul 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Socio-economic data on grid level (SUF 7.1). Car segments Sozioökonomische Daten auf Rasterebene (SUF 7.1). PKW-Segmente - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/a276ccc8-a78e-55ee-9563-256bc24dd927
    Explore at:
    Dataset updated
    Jul 31, 2025
    Description

    Due to an increasing variety of the product lines of car manufacturers, brand manufacturers supply cars in almost every segment. The brand of one's car does not allow for conclusions about the socio-economic status. For the car segments, cars have been aggregated to classes that allow for this kind of conclusion. In addition to the car capability, car segments provide information on the intended use of the car. The dataset comprises of 12 car segments: mini cars, compact cars, lower mid-range cars, mid-range cars, upper mid-range cars, top-of-the-range cars, ATVs, cabriolets, estate cars, vans, utility vehicles, other vehicles (microm 2016, p. 96). The following classes are available. Mini cars: Mini cars used to be included in the segment of small cars, but not constitute an own segment due to an increased market share. These vehicles are characterized by an exceptionally small size. Sometimes they only provide two seats (examples: Renault Twingo, Ford Ka, VW Up, Peugeot 107 and smart fortwo) (microm 2016, p. 97). Compact cars: The definition of a compact car is controversial and has changed over time. The general idea is that of a cheap way to be mobile on four wheels with a roof, often with compromises regarding space and comfort (examples: VW Polo, Opel Corsa, Ford Fiesta, Fiat Punto, Peugeot 207 and Renault Clio) (microm 2016, p. 97). Da die Angebotspalette der unterschiedlichen PKW Hersteller immer breiter wird, bieten die Markenhersteller Fahrzeuge in fast allen Segmenten an. Die Marke allein lässt keinen einfachen Rückschluss auf den sozioökonomischen Status des Besitzers zu. Bei den PKW Segmenten sind Fahrzeuge zu Klassen zusammengefasst worden, die wiederum solche Aussagen ermöglichen. Darüber hinaus wird neben dem Leistungsvermögen der Fahrzeuge auch die Nutzungsintention der Fahrzeughalter deutlich. Es liegen Informationen über folgende 12 PKW Segmente vor: Miniwagen, Kleinwagen, Untere Mittelklassewagen, Mittelklassewagen, Obere Mittelklassewagen, Oberklassewagen, Geländewagen, Cabriolets, Kombiwagen, Vans, Utilities, sonstige PKW Segmente (microm 2016, S. 96) Miniwagen: Miniwagen gehörten lange Zeit zu den Kleinwagen und bilden nun aufgrund des gestiegenen Marktanteils ein eigenes Segment. Diese Fahrzeuge zeichnen sich vor allem durch eine ausgesprochen geringe Größe aus. Zum Teil besitzen sie sogar nur zwei vollwertige Sitze (Vertreter dieses Segments sind: Renault Twingo, Ford Ka, VW Up, Peugeot 107 und smart fortwo) (microm 2016, S. 97). Kleinwagen: Die Definition eines Kleinwagens ist umstritten und mit den Zeiten veränderlich. Immer aber geht es um eine preisgünstige Möglichkeit, auf vier Rädern mit einem Dach darüber motorisiert zu sein - sehr häufig jedoch mit Kompromissen gegenüber Platz und Komfort (Vertreter des Segmentes: VW Polo, Opel Corsa, Ford Fiesta, Fiat Punto, Peugeot 207 und Renault Clio) (microm 2016, S. 97). RWI-GEO-GRID Other For data privacy reasons, houses within a residential environment are summed up to a "virtual" micro-geographic segment (so-called micro-cell), which on average comprises eight, but at least five households. Houses in which at least five households live become a distinct micro-cell, while houses with less than five households are combined with similar houses on the same street. Combined houses are as close as possible in spatial terms. Structural indicators are aggregated on the micro cell level and subsequently computed household level averages are computed (microm 2016, p.8). If such data exist, the calculated data is made consistent with official data sources (microm 2014, p. 2). Additionally, due to the cooperation with SOEP, it is possible to validate the small scale regional data of microm (microm 2016, p. 8). The dataset is based on the variable group microm-Basis which is comprised of four categories: number of households, number of business enterprises, number of houses (including those purely used for business), and number of residential houses (excluding those purely used for business) (cf. microm 2016, p. 26). The number of houses on the street segment level is the basis for all aggregations to other regional levels. Based on business registers, the number of enterprises in each house is determined. Aus Gründen des Datenschutzes werden mehrere zu einem Wohnfeld gehörende Häuser zu einer sogenannten Mikrozelle zusammengeführt. Diese Mikrozellen beinhalten mindestens fünf, durchschnittlich acht Haushalte. Dabei bilden größere Häuser, in denen mehr als fünf Haushalte leben ihr eigenes Segment. Häuser mit weniger als fünf Haushalten werden mit vergleichbaren Häusern in der gleichen Straße zusammengefasst, so dass die Mikrozelle dann mindestens fünf Haushalte umfasst. Die zusammengefassten Häuser weisen eine größtmögliche räumliche Nähe auf. Zudem werden Strukturindikatoren für die jeweilige Mikrozelle verdichtet, und die Information wieder pro Haushalt als Datei aufbereitet (microm 2016, S. 8). Wann immer dies möglich ist, werden die errechneten Daten mit anderen Datenquellen wie beispielsweise amtlichen Daten, die auf einer höheren Aggregationsstufe vorliegen, abgeglichen (microm 2014, S. 2). Zudem ermöglicht die Zusammenarbeit des SOEP und microm eine inhaltliche Beschreibung sowie Validierung der kleinräumigen microm Daten (microm 2016, S. 8). Die Basis des Datensatzes bildet die Variablengruppe microm-Basis. Diese ist in die folgenden vier Unterklassen unterteilt: Anzahl der Haushalte, Anzahl der Gewerbebetriebe, Anzahl der Häuser (inklusive reiner Gewerbehäuser) und Anzahl der Wohngebäude (exklusive reiner Gewerbehäuser) (vgl. microm 2016, S. 26). Die Anzahl der Häuser wird auf Grundlage der Straßenabschnitte für alle übergeordneten Gebietsebenen ermittelt. Basierend auf Unternehmensregistern werden für jedes Haus Aussagen zu der Anzahl der Gewerbebetriebe gemacht. Microm uses more than a billion individual data points for the aggregation of the microm dataset. These are anonymised and stem from various data sources. The data points are available for all 40.9 million households in Germany, while the final data product contains information on approximately 20 million houses (microm 2016, p. 8). In die Datenerstellung fließen mehr als eine Milliarde Einzelinformationen ein. Diese Daten liegen anonymisiert vor und werden von mehreren Quellen bezogen. Die einfließenden Daten liegen grundsätzlich für sämtliche 40.9 Mio. Haushalte in Deutschland vor und werden für die rund 20 Mio. Häuser ausgewiesen (microm 2016, S. 8).

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eduardo J. Aguilar; Valmir C. Barbosa (2023). S1 File - [Dataset]. http://doi.org/10.1371/journal.pone.0286312.s001

S1 File -

Related Article
Explore at:
application/x-gzipAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Eduardo J. Aguilar; Valmir C. Barbosa
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called “midrange” distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.

Search
Clear search
Close search
Google apps
Main menu