9 datasets found
  1. Datasets for manuscript "A data engineering framework for chemical flow...

    • catalog.data.gov
    • gimi9.com
    Updated Nov 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
    Explore at:
    Dataset updated
    Nov 7, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).

  2. World's Air Quality and Water Pollution Dataset

    • kaggle.com
    zip
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VICTOR AHAJI (2023). World's Air Quality and Water Pollution Dataset [Dataset]. https://www.kaggle.com/datasets/victorahaji/worlds-air-quality-and-water-pollution-dataset/discussion
    Explore at:
    zip(59538 bytes)Available download formats
    Dataset updated
    Oct 30, 2023
    Authors
    VICTOR AHAJI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    World
    Description

    The Dataset "World's Air Quality and Water Pollution" was obtained from Jack Jae Hwan Kim Kaggle page. This Dataset is comprized of 5 columns; "City", "Region", "Country", "Air Quality" and "Water Pollution". The last two columns consist of values varying from 0 to 100; Air Quality Column: Air quality varies from 0 (bad quality) to 100 (top good quality) Water Pollution Column: Water pollution varies from 0 (no pollution) to 100 (extreme pollution).

  3. Eaton Fire Resident's United: Pre-Remediation Indoor Contamination Test...

    • zenodo.org
    bin, csv, pdf +2
    Updated Aug 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Wronkiewicz; Mark Wronkiewicz; Nicole M. G. Maccalla; Nicole M. G. Maccalla; Efram Potelle; Michelle Botica; Serina Diniega; Serina Diniega; Laura Pearlman; Laura Pearlman; Cathelin Huang; Cathelin Huang; Jane Lawton Potelle; Efram Potelle; Michelle Botica; Jane Lawton Potelle (2025). Eaton Fire Resident's United: Pre-Remediation Indoor Contamination Test Results [Dataset]. http://doi.org/10.5281/zenodo.16762192
    Explore at:
    bin, text/x-python, pdf, csv, txtAvailable download formats
    Dataset updated
    Aug 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mark Wronkiewicz; Mark Wronkiewicz; Nicole M. G. Maccalla; Nicole M. G. Maccalla; Efram Potelle; Michelle Botica; Serina Diniega; Serina Diniega; Laura Pearlman; Laura Pearlman; Cathelin Huang; Cathelin Huang; Jane Lawton Potelle; Efram Potelle; Michelle Botica; Jane Lawton Potelle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scope of this Dataset

    The Eaton Fire in Southern California contaminated homes across Altadena, Pasadena, and Sierra Madre with hazardous ash and soot. Residents feared serious health risks prompting Eaton Fire Residents United (EFRU), a grassroots coalition of community members, to begin collecting, compiling, and publicly sharing residential contamination testing data taken by professional industrial hygienists.

    This dataset contains anonymized, professionally collected contamination test results from over 200 affected homes. Researchers and advocates can leverage this dataset to study contamination patterns and support evidence-based policy improvements related to wildfire recovery and public health.

    More information about EFRU as well as a live version of this information is available at www.efru.la

    Contents

    Datasheet for Dataset

    The included Datasheet for Dataset provides comprehensive details about the dataset's contents, methods of collection, data anonymization practices, and suggested use-cases.

    Standard Operating Procedures

    The SOP document contains detailed procedures for processing the original resident-provided test reports to anonymize and compile the data.

    Code Examples

    A basic Python Notebook is included for loading, exploring, and visualizing the dataset. This script should facilitate researchers getting started with the dataset.

    Data

    The dataset includes contaminant levels measured inside 201 homes in CSV format. Each row in the CSV provides:

    • Anonymized location (nearest cross-street)
    • Peak measured concentrations of contaminants and other elements, including wildfire debris (ash, soot, char), asbestos, lead, arsenic, antimony, barium, beryllium, cadmium, chromium, cobalt, copper, mercury, molybdenum, nickel, selenium, silver, thallium, vanadium, and zinc.
    • User-reported proximity to burned structures
    • Damage and remediation status at testing time

    Updates

    The dataset will be updated periodically as additional residential testing results become available. Further releases as well as minor corrections are expected as this is an ongoing effort by community volunteers.

    Acknowledgements

    We express our profound gratitude to community members who have voluntarily shared their reports and helped us compile this dataset, all in pursuit of rebuilding a healthier, safer community. We also acknowledge Jennifer Cotton, Jordan Boye, and Dawn Fanning for their contributions as well as the entire EFRU community.

  4. GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over...

    • zenodo.org
    nc, pdf, zip
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu (2025). GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over Land (2017–Present) [Dataset]. http://doi.org/10.5281/zenodo.10800980
    Explore at:
    nc, zip, pdfAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 11, 2022
    Description

    GlobalHighPM2.5 is part of a series of long-term, seamless, global, high-resolution, and high-quality datasets of air pollutants over land (i.e., GlobalHighAirPollutants, GHAP). It is generated from big data sources (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence, taking into account the spatiotemporal heterogeneity of air pollution.

    This dataset contains input data, analysis codes, and generated dataset used for the following article. If you use the GlobalHighPM2.5 dataset in your scientific research, please cite the following reference (Wei et al., NC, 2023):

    Input Data

    Relevant raw data for each figure (compiled into a single sheet within an Excel document) in the manuscript.

    Code

    Relevant Python scripts for replicating and ploting the analysis results in the manuscript, as well as codes for converting data formats.

    Generated Dataset

    Here is the first big data-derived seamless (spatial coverage = 100%) daily, monthly, and yearly 1 km (i.e., D1K, M1K, and Y1K) global ground-level PM2.5 dataset over land from 2017 to the present. This dataset exhibits high quality, with cross-validation coefficients of determination (CV-R2) of 0.91, 0.97, and 0.98, and root-mean-square errors (RMSEs) of 9.20, 4.15, and 2.77 µg m-3 on the daily, monthly, and annual bases, respectively.

    Due to data volume limitations,

    all (including daily) data for the year 2022 is accessible at: GlobalHighPM2.5 (2022)

    all (including daily) data for the year 2021 is accessible at: GlobalHighPM2.5 (2021)

    all (including daily) data for the year 2020 is accessible at: GlobalHighPM2.5 (2020)

    all (including daily) data for the year 2019 is accessible at: GlobalHighPM2.5 (2019)

    all (including daily) data for the year 2018 is accessible at: GlobalHighPM2.5 (2018)

    all (including daily) data for the year 2017 is accessible at: GlobalHighPM2.5 (2017)

    continuously updated...

    More GHAP datasets for different air pollutants are available at: https://weijing-rs.github.io/product.html

  5. c

    Spectral inclusion and pollution for a class of dissipative perturbations -...

    • research-data.cardiff.ac.uk
    zip
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexei Stepanenko (2024). Spectral inclusion and pollution for a class of dissipative perturbations - data [Dataset]. http://doi.org/10.17035/d.2021.0125587613
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    Cardiff University
    Authors
    Alexei Stepanenko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dissipative barriers are a class of perturbations for differential operators, that can be utilised to help numerically compute eigenvalues. This dataset is comprised of numerical computations of the eigenvalues of three example of differential operators perturbed by dissipative barriers. These were used in the paper ''Spectral inclusion and pollution for a class of dissipative perturbations'' (DOI: 10.1063/5.0028440, freely available at arXiv:2006.10097) as illustrations to theoretical results. Please see that paper, in particular Section 5, for more precise information on the contents on the dataset. The datasets are numpy array (Python programming language) saved as pickle files and can be opened using the pickle package (see https://docs.python.org/3/library/pickle.html).

  6. Indian-Air-pollution-Analysis-using-Python

    • kaggle.com
    zip
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soumendu Ray (2025). Indian-Air-pollution-Analysis-using-Python [Dataset]. https://www.kaggle.com/datasets/soumenduray99/indian-air-pollution-analysis-using-python
    Explore at:
    zip(8703457 bytes)Available download formats
    Dataset updated
    May 6, 2025
    Authors
    Soumendu Ray
    Area covered
    India
    Description

    🌫️ Air Pollution Trends in Indian Cities - Data Analysis Project 🎯 Objective The primary objective of this project is to analyze air pollution trends across Indian cities by studying various pollutant concentrations, their correlations, and seasonal as well as geographic variations. The analysis is intended to identify high-risk areas and emphasize the need for targeted interventions to improve public health and air quality.

    📊 Analytical Insights 1️⃣ Date & Time-Based Analysis PM2.5 & PM10 Trends 📅 PM2.5 levels remained relatively stable with a slight rise in 2024 (125.848). PM10 peaked in 2018 (217.195) and hit a low in 2015 (213.597). AQI Concentration Across Months 🌤️ October recorded the highest average AQI (56.39), followed by July and April. December had the lowest AQI (54.60). NO2 Concentration in the Last 5 Years 🔴 Highest in November 2024 (206.69). Significant peaks in May 2021 and March 2020. SO2 Emissions Over the Last 5 Years 🌫️ Highest in December (202.69). Lowest in February (198.62). Weekday vs. Weekend Pollution Levels 📆 East India saw higher AQI on weekends. North, South, and West India had higher AQI on weekdays. 2️⃣ State & Location-Based Analysis Most Polluted State (AQI) 🚨 Uttar Pradesh had the highest average AQI (162.61). SO2 Levels - North vs. South 🏭 South India recorded slightly higher SO2 emissions than the North. Top 5 Most Polluted Locations (PM10) 🌆 Amritsar (Punjab) - 217.64 Jodhpur (Rajasthan) - 217.41 Surat (Gujarat) - 217.01 Vadodara (Gujarat) - 216.92 New Delhi (Delhi) - 216.61 Cleanest Locations (PM2.5 Levels) 🌱 Vadodara (122.47), Udaipur (123.32), Dwarka (123.76) 3️⃣ Pollutant-Specific Insights Most Drastic Increase in PM2.5 🚩 Punjab (Industrial Area) recorded 127.10 PM2.5. Highest SO2 Levels ☣️ Gujarat (Sensitive Area): 812.39 Biggest Contributor to Air Pollution 🏭 Maharashtra topped in SO2, CO, NO2, and PM10 emissions. Correlation Between Pollutants 🔗 SO2 & NO2: Slight negative correlation (-0.0027) PM10 & PM2.5: Near-zero correlation (-0.0013) 4️⃣ Seasonal & Comparative Trends Most Polluted Season 🍂 Winter had the highest PM10 levels. Rainy season saw the highest PM2.5 levels. CO Pollution by Season 💨 Winter recorded the highest CO concentration (200.49), followed by Autumn (200.40). Quarterly Pollution Trends 🏭 Q4 (Oct–Dec) saw the highest pollution levels for SO2, NO2, and PM10. Impact of COVID on Pollution 🦠 SO2 levels dropped from 799.05 to 796.50 during lockdown periods. NO2 levels also showed a minor reduction. 🧰 Tech Stack Data Processing: Python (Pandas, NumPy) Data Visualization: Matplotlib, Seaborn, Power BI Query Language: SQL (for pollutant dataset extraction) Version Control: Git & GitHub

  7. m

    Data for: Bioindication of Radioactive Contamination by Honey Bees in the...

    • data.mendeley.com
    Updated Aug 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philip Sorokin (2025). Data for: Bioindication of Radioactive Contamination by Honey Bees in the Bryansk and Rostov Regions: Foraging Dynamics of ¹³⁷Cs and ⁴⁰K in the Plant–Bee–Bee Product Pathway [Dataset]. http://doi.org/10.17632/nb2w7w9zc2.4
    Explore at:
    Dataset updated
    Aug 18, 2025
    Authors
    Philip Sorokin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains supporting resources for:

    1. Source code: Python script for calculating landscape-type areas used to determine ¹³⁷Cs activity weighting factors.
    2. Operational map: Satellite map for spatial analysis of contamination heterogeneity.
    3. Pollen analysis protocols: Standardized procedures for melissopalynological examination (focused on elevated ¹³⁷Cs in plant species within honey).

    Generated for the study:

    Bioindication of radioactive contamination by honey bees in the Bryansk and Rostov regions: Foraging dynamics of ¹³⁷Cs and ⁴⁰K in the plant–bee–bee product pathway (Sorokin, 2025).

  8. Data from: LIGHT RAIN CHARACTERIZATION IN PIRACICABA, SÃO PAULO STATE,...

    • scielo.figshare.com
    tiff
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabio T. Johanson; Asdrubal J. Farias-Ramirez; Marco A. Jacomazzi; Sergio N. Duarte; Maria A. Moreno-Pizani (2023). LIGHT RAIN CHARACTERIZATION IN PIRACICABA, SÃO PAULO STATE, BRAZIL [Dataset]. http://doi.org/10.6084/m9.figshare.22187844.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Fabio T. Johanson; Asdrubal J. Farias-Ramirez; Marco A. Jacomazzi; Sergio N. Duarte; Maria A. Moreno-Pizani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil, Piracicaba, State of São Paulo
    Description

    ABSTRACT The disorderly soil occupation without the necessary conservationist practices leads to impacts on the local hydrology and induces the pollution of water resources. This pollution may come from more urbanized areas due to the amount of pollutants drained during the rains. Even moderate precipitations constitute one of the main factors that define pollutant runoff on the surface. These rains have recently been called light rains. Light rains have a lower precipitation height and a higher frequency compared to classic rains of drainage projects, being necessary to define them according to patterns of rain frequency for each region. This study aimed to characterize light rain in the municipality of Piracicaba to establish statistical standards for the frequency of certain precipitation heights. A database provided by the ESALQ/USP automatic weather station, which provides precipitation measurements every 15 minutes, was used in the present study. Light rain heights reached 40.3, 41.4, and 42.7 mm for 100, 90, or 80% frequencies, respectively, which implies the use of return periods of 1.00, 1.11, and 1.25 years, respectively.

  9. OpenAQ

    • kaggle.com
    zip
    Updated Dec 1, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open AQ (2017). OpenAQ [Dataset]. https://www.kaggle.com/datasets/open-aq/openaq
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Dec 1, 2017
    Dataset authored and provided by
    Open AQ
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.

    Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).

    Update Frequency: Weekly

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.openaq.[TABLENAME]. Fork this kernel to get started.

    Acknowledgements

    Dataset Source: openaq.org

    Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source and is provided "AS IS" without any warranty, express or implied.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
Organization logo

Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations"

Explore at:
Dataset updated
Nov 7, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).

Search
Clear search
Close search
Google apps
Main menu