5 datasets found
  1. U.S. Community Water Systems Service Boundaries

    • redivis.com
    application/jsonl +7
    Updated Jul 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Impact Data Collaborative (2022). U.S. Community Water Systems Service Boundaries [Dataset]. https://redivis.com/datasets/zzz6-3nt04xxnc
    Explore at:
    parquet, sas, stata, csv, avro, application/jsonl, arrow, spssAvailable download formats
    Dataset updated
    Jul 11, 2022
    Dataset provided by
    Redivis Inc.
    Authors
    Environmental Impact Data Collaborative
    Area covered
    Description

    Abstract

    This is a layer of water service boundaries for 44,786 community water systems that deliver tap water to 307.1 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. In v1.0.0, Tier 2b reflects overlapping boundaries for multiple systems. In v2.0.0 Tier 2b is removed through a "best match" algorithm that assigns one water system to one TIGER place. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

  2. d

    U.S. Community Water Systems Service Boundaries, v1.0.0

    • dataone.org
    • hydroshare.org
    • +1more
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SimpleLab; EPIC (2023). U.S. Community Water Systems Service Boundaries, v1.0.0 [Dataset]. https://dataone.org/datasets/sha256%3Ac09976eeb7291398a35b301f91b8fc0f5e7e72e1dda8195578747d5dbfef00dc
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    SimpleLab; EPIC
    Area covered
    Description

    This is a layer of water service boundaries for 44,919 community water systems that deliver tap water to 306.88 million people in the US. This amounts to 97.22% of the population reportedly served by active community water systems and 90.85% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. Tier 2b reflects overlapping boundaries for multiple systems. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a or Tier 2b), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

    Several limitations to this data exist–and the layer should be used with these in mind. First, the case of assigning a Census Place TIGER polygon to multiple systems results in an inaccurate assignment of the same exact area to multiple systems; we hope to resolve Tier 2b systems into Tier 2a or Tier 3 in a future iteration. Second, matching algorithms to assign Census Place boundaries require additional validation and iteration. Third, Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Fourth, missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

    All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.

  3. d

    U.S. Community Water Systems Service Boundaries, v3.0.0

    • search.dataone.org
    • hydroshare.org
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SimpleLab; EPIC (2023). U.S. Community Water Systems Service Boundaries, v3.0.0 [Dataset]. https://search.dataone.org/view/sha256%3A255543c63c010760308ca94aa16ebe1f52294c7381d1a50f50c94fb984519805
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    SimpleLab; EPIC
    Area covered
    Description

    This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 93% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When multiple water systems match to the same TIGER boundary, we employ a "best match" algorithm that assigns one water system to one TIGER place based on features like population served and other locational information about the water system. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3). Water system centroids are taken from the ECHO database; however, where a system centroid is labeled as a county or state centroid, we take several steps to assign a better centroid (using sources like UCMR or TIGER). A summary of the systems and population assigned to different tiers is as follows:

    Population coverage rates per Tier, for systems with population reported: - Tier 1: 49.3% population covered (155,869,771 people) - Tier 2: 35.13% population covered (111,074,087 people) - Tier 3: 12.9% population covered (40,771,645 people)

    Active community water systems coverage rates per Tier: - Tier 1: 35.7% system covered (17645 systems) - Tier 2: 22.42% system covered (11079 systems) - Tier 3: 34.9% system covered (17249 systems) - No Tier/Geometry: 6.98% system covered (3451 systems)

    Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system first introduced in v2.0.0 requires further validation. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Since v 2.0.0 we have improved the percentage of Tier 3 geometries with state centroids and county centroids from 50% of Tier 3 boundaries to 30% of Tier 3 boundaries. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

    Changelog:

    3.0.0 (2022-10-31)

  4. H

    U.S. Community Water Systems Service Boundaries, v2.0.0

    • beta.hydroshare.org
    • hydroshare.org
    • +1more
    zip
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HydroShare (2022). U.S. Community Water Systems Service Boundaries, v2.0.0 [Dataset]. https://beta.hydroshare.org/resource/20b908d73a784fc1a097a3b3f2b58bfb/
    Explore at:
    zip(637.9 MB)Available download formats
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    HydroShare
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Description

    This is a layer of water service boundaries for 44,786 community water systems that deliver tap water to 307.1 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. In v1.0.0, Tier 2b reflects overlapping boundaries for multiple systems. In v2.0.0 Tier 2b is removed through a "best match" algorithm that assigns one water system to one TIGER place. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

    Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system in v2.0.0 requires further validation. Many systems were then assigned to Tier 3. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Future iterations plan to improve upon geometry quality for modeled systems. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

    All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.

  5. H

    U.S. Community Water Systems Service Boundaries, v2.4.0

    • hydroshare.org
    zip
    Updated Sep 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HydroShare (2022). U.S. Community Water Systems Service Boundaries, v2.4.0 [Dataset]. https://www.hydroshare.org/resource/b11b8982eebd4843833932f085f71d92
    Explore at:
    zip(650.9 MB)Available download formats
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    HydroShare
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Description

    This is a layer of water service boundaries for 46,014 community water systems that deliver tap water to 307.7 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When multiple water systems match to the same TIGER boundary, we employ a "best match" algorithm that assigns one water system to one TIGER place based on features like population served and other locational information about the water system. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3). Water system centroids are taken from the ECHO database; however, where a system centroid is labeled as a county or state centroid, we take several steps to assign a better centroid (using sources like UCMR or TIGER). A summary of the systems and population assigned to different tiers is as follows:

    Population coverage rates per Tier, for systems with population reported: - Tier 1: 45.6% population covered (140,302,401 people) - Tier 2: 39.98% population covered (123,028,626 people) - Tier 3: 14.42% population covered (44,372,326 people)

    Active community water systems coverage rates per Tier: - Tier 1: 35.61% system covered (17600 systems) - Tier 2: 22.49% system covered (11117 systems) - Tier 3: 35% system covered (17297 systems) - No Tier/Geometry: 6.9% system covered (3410 systems)

    Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system first introduced in v2.0.0 requires further validation. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Since v 2.0.0 we have improved the percentage of Tier 3 geometries with state centroids and county centroids from 50% of Tier 3 boundaries to 30% of Tier 3 boundaries. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Environmental Impact Data Collaborative (2022). U.S. Community Water Systems Service Boundaries [Dataset]. https://redivis.com/datasets/zzz6-3nt04xxnc
Organization logo

U.S. Community Water Systems Service Boundaries

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
parquet, sas, stata, csv, avro, application/jsonl, arrow, spssAvailable download formats
Dataset updated
Jul 11, 2022
Dataset provided by
Redivis Inc.
Authors
Environmental Impact Data Collaborative
Area covered
Description

Abstract

This is a layer of water service boundaries for 44,786 community water systems that deliver tap water to 307.1 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. In v1.0.0, Tier 2b reflects overlapping boundaries for multiple systems. In v2.0.0 Tier 2b is removed through a "best match" algorithm that assigns one water system to one TIGER place. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

Search
Clear search
Close search
Google apps
Main menu