9 datasets found

Datasets for manuscript "A data engineering framework for chemical flow...
catalog.data.gov
gimi9.com
Updated Nov 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
Explore at:
Dataset updated
Nov 7, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).
World's Air Quality and Water Pollution Dataset
kaggle.com
zip
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VICTOR AHAJI (2023). World's Air Quality and Water Pollution Dataset [Dataset]. https://www.kaggle.com/datasets/victorahaji/worlds-air-quality-and-water-pollution-dataset/discussion
Explore at:
zip(59538 bytes)Available download formats
Dataset updated
Oct 30, 2023
Authors
VICTOR AHAJI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
World
Description
The Dataset "World's Air Quality and Water Pollution" was obtained from Jack Jae Hwan Kim Kaggle page. This Dataset is comprized of 5 columns; "City", "Region", "Country", "Air Quality" and "Water Pollution". The last two columns consist of values varying from 0 to 100; Air Quality Column: Air quality varies from 0 (bad quality) to 100 (top good quality) Water Pollution Column: Water pollution varies from 0 (no pollution) to 100 (extreme pollution).
Eaton Fire Resident's United: Pre-Remediation Indoor Contamination Test...
zenodo.org
bin, csv, pdf +2
Updated Aug 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Wronkiewicz; Mark Wronkiewicz; Nicole M. G. Maccalla; Nicole M. G. Maccalla; Efram Potelle; Michelle Botica; Serina Diniega; Serina Diniega; Laura Pearlman; Laura Pearlman; Cathelin Huang; Cathelin Huang; Jane Lawton Potelle; Efram Potelle; Michelle Botica; Jane Lawton Potelle (2025). Eaton Fire Resident's United: Pre-Remediation Indoor Contamination Test Results [Dataset]. http://doi.org/10.5281/zenodo.16762192
Explore at:
bin, text/x-python, pdf, csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16762192
Dataset updated
Aug 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mark Wronkiewicz; Mark Wronkiewicz; Nicole M. G. Maccalla; Nicole M. G. Maccalla; Efram Potelle; Michelle Botica; Serina Diniega; Serina Diniega; Laura Pearlman; Laura Pearlman; Cathelin Huang; Cathelin Huang; Jane Lawton Potelle; Efram Potelle; Michelle Botica; Jane Lawton Potelle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scope of this Dataset

The Eaton Fire in Southern California contaminated homes across Altadena, Pasadena, and Sierra Madre with hazardous ash and soot. Residents feared serious health risks prompting Eaton Fire Residents United (EFRU), a grassroots coalition of community members, to begin collecting, compiling, and publicly sharing residential contamination testing data taken by professional industrial hygienists.

This dataset contains anonymized, professionally collected contamination test results from over 200 affected homes. Researchers and advocates can leverage this dataset to study contamination patterns and support evidence-based policy improvements related to wildfire recovery and public health.

More information about EFRU as well as a live version of this information is available at www.efru.la

Contents

Datasheet for Dataset

The included Datasheet for Dataset provides comprehensive details about the dataset's contents, methods of collection, data anonymization practices, and suggested use-cases.

Standard Operating Procedures

The SOP document contains detailed procedures for processing the original resident-provided test reports to anonymize and compile the data.

Code Examples

A basic Python Notebook is included for loading, exploring, and visualizing the dataset. This script should facilitate researchers getting started with the dataset.

Data

The dataset includes contaminant levels measured inside 201 homes in CSV format. Each row in the CSV provides:

Anonymized location (nearest cross-street)

Peak measured concentrations of contaminants and other elements, including wildfire debris (ash, soot, char), asbestos, lead, arsenic, antimony, barium, beryllium, cadmium, chromium, cobalt, copper, mercury, molybdenum, nickel, selenium, silver, thallium, vanadium, and zinc.

User-reported proximity to burned structures

Damage and remediation status at testing time

Updates

The dataset will be updated periodically as additional residential testing results become available. Further releases as well as minor corrections are expected as this is an ongoing effort by community volunteers.

Acknowledgements

We express our profound gratitude to community members who have voluntarily shared their reports and helped us compile this dataset, all in pursuit of rebuilding a healthier, safer community. We also acknowledge Jennifer Cotton, Jordan Boye, and Dawn Fanning for their contributions as well as the entire EFRU community.
GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over...
zenodo.org
nc, pdf, zip
Updated May 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu (2025). GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over Land (2017–Present) [Dataset]. http://doi.org/10.5281/zenodo.10800980
Explore at:
nc, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10800980
Dataset updated
May 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 11, 2022
Description
GlobalHighPM_2.5 is part of a series of long-term, seamless, global, high-resolution, and high-quality datasets of air pollutants over land (i.e., GlobalHighAirPollutants, GHAP). It is generated from big data sources (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence, taking into account the spatiotemporal heterogeneity of air pollution.

This dataset contains input data, analysis codes, and generated dataset used for the following article. If you use the GlobalHighPM_2.5 dataset in your scientific research, please cite the following reference (Wei et al., NC, 2023):

Wei, J., Li, Z., Lyapustin, A., Wang, J., Dubovik, O., Schwartz, J., Sun, L., Li, C., Liu, S., and Zhu, T. First close insight into global daily gapless 1 km PM_2.5 pollution, variability, and health impact. Nature Communications, 2023, 14, 8349. https://doi.org/10.1038/s41467-023-43862-3

Input Data

Relevant raw data for each figure (compiled into a single sheet within an Excel document) in the manuscript.

Code

Relevant Python scripts for replicating and ploting the analysis results in the manuscript, as well as codes for converting data formats.

Generated Dataset

Here is the first big data-derived seamless (spatial coverage = 100%) daily, monthly, and yearly 1 km (i.e., D1K, M1K, and Y1K) global ground-level PM_2.5 dataset over land from 2017 to the present. This dataset exhibits high quality, with cross-validation coefficients of determination (CV-R²) of 0.91, 0.97, and 0.98, and root-mean-square errors (RMSEs) of 9.20, 4.15, and 2.77 µg m^-3 on the daily, monthly, and annual bases, respectively.

Due to data volume limitations,

all (including daily) data for the year 2022 is accessible at: GlobalHighPM2.5 (2022)

all (including daily) data for the year 2021 is accessible at: GlobalHighPM2.5 (2021)

all (including daily) data for the year 2020 is accessible at: GlobalHighPM2.5 (2020)

all (including daily) data for the year 2019 is accessible at: GlobalHighPM2.5 (2019)

all (including daily) data for the year 2018 is accessible at: GlobalHighPM2.5 (2018)

all (including daily) data for the year 2017 is accessible at: GlobalHighPM2.5 (2017)

continuously updated...

More GHAP datasets for different air pollutants are available at: https://weijing-rs.github.io/product.html
c
Spectral inclusion and pollution for a class of dissipative perturbations -...
research-data.cardiff.ac.uk
zip
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexei Stepanenko (2024). Spectral inclusion and pollution for a class of dissipative perturbations - data [Dataset]. http://doi.org/10.17035/d.2021.0125587613
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.17035/d.2021.0125587613
Dataset updated
Sep 18, 2024
Dataset provided by
Cardiff University
Authors
Alexei Stepanenko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dissipative barriers are a class of perturbations for differential operators, that can be utilised to help numerically compute eigenvalues. This dataset is comprised of numerical computations of the eigenvalues of three example of differential operators perturbed by dissipative barriers. These were used in the paper ''Spectral inclusion and pollution for a class of dissipative perturbations'' (DOI: 10.1063/5.0028440, freely available at arXiv:2006.10097) as illustrations to theoretical results. Please see that paper, in particular Section 5, for more precise information on the contents on the dataset. The datasets are numpy array (Python programming language) saved as pickle files and can be opened using the pickle package (see https://docs.python.org/3/library/pickle.html).
Indian-Air-pollution-Analysis-using-Python
kaggle.com
zip
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soumendu Ray (2025). Indian-Air-pollution-Analysis-using-Python [Dataset]. https://www.kaggle.com/datasets/soumenduray99/indian-air-pollution-analysis-using-python
Explore at:
zip(8703457 bytes)Available download formats
Dataset updated
May 6, 2025
Authors
Soumendu Ray
Area covered
India
Description
🌫️ Air Pollution Trends in Indian Cities - Data Analysis Project 🎯 Objective The primary objective of this project is to analyze air pollution trends across Indian cities by studying various pollutant concentrations, their correlations, and seasonal as well as geographic variations. The analysis is intended to identify high-risk areas and emphasize the need for targeted interventions to improve public health and air quality.

📊 Analytical Insights 1️⃣ Date & Time-Based Analysis PM2.5 & PM10 Trends 📅 PM2.5 levels remained relatively stable with a slight rise in 2024 (125.848). PM10 peaked in 2018 (217.195) and hit a low in 2015 (213.597). AQI Concentration Across Months 🌤️ October recorded the highest average AQI (56.39), followed by July and April. December had the lowest AQI (54.60). NO2 Concentration in the Last 5 Years 🔴 Highest in November 2024 (206.69). Significant peaks in May 2021 and March 2020. SO2 Emissions Over the Last 5 Years 🌫️ Highest in December (202.69). Lowest in February (198.62). Weekday vs. Weekend Pollution Levels 📆 East India saw higher AQI on weekends. North, South, and West India had higher AQI on weekdays. 2️⃣ State & Location-Based Analysis Most Polluted State (AQI) 🚨 Uttar Pradesh had the highest average AQI (162.61). SO2 Levels - North vs. South 🏭 South India recorded slightly higher SO2 emissions than the North. Top 5 Most Polluted Locations (PM10) 🌆 Amritsar (Punjab) - 217.64 Jodhpur (Rajasthan) - 217.41 Surat (Gujarat) - 217.01 Vadodara (Gujarat) - 216.92 New Delhi (Delhi) - 216.61 Cleanest Locations (PM2.5 Levels) 🌱 Vadodara (122.47), Udaipur (123.32), Dwarka (123.76) 3️⃣ Pollutant-Specific Insights Most Drastic Increase in PM2.5 🚩 Punjab (Industrial Area) recorded 127.10 PM2.5. Highest SO2 Levels ☣️ Gujarat (Sensitive Area): 812.39 Biggest Contributor to Air Pollution 🏭 Maharashtra topped in SO2, CO, NO2, and PM10 emissions. Correlation Between Pollutants 🔗 SO2 & NO2: Slight negative correlation (-0.0027) PM10 & PM2.5: Near-zero correlation (-0.0013) 4️⃣ Seasonal & Comparative Trends Most Polluted Season 🍂 Winter had the highest PM10 levels. Rainy season saw the highest PM2.5 levels. CO Pollution by Season 💨 Winter recorded the highest CO concentration (200.49), followed by Autumn (200.40). Quarterly Pollution Trends 🏭 Q4 (Oct–Dec) saw the highest pollution levels for SO2, NO2, and PM10. Impact of COVID on Pollution 🦠 SO2 levels dropped from 799.05 to 796.50 during lockdown periods. NO2 levels also showed a minor reduction. 🧰 Tech Stack Data Processing: Python (Pandas, NumPy) Data Visualization: Matplotlib, Seaborn, Power BI Query Language: SQL (for pollutant dataset extraction) Version Control: Git & GitHub
m
Data for: Bioindication of Radioactive Contamination by Honey Bees in the...
data.mendeley.com
Updated Aug 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philip Sorokin (2025). Data for: Bioindication of Radioactive Contamination by Honey Bees in the Bryansk and Rostov Regions: Foraging Dynamics of ¹³⁷Cs and ⁴⁰K in the Plant–Bee–Bee Product Pathway [Dataset]. http://doi.org/10.17632/nb2w7w9zc2.4
Explore at:
Unique identifier
https://doi.org/10.17632/nb2w7w9zc2.4
Dataset updated
Aug 18, 2025
Authors
Philip Sorokin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains supporting resources for:

Source code: Python script for calculating landscape-type areas used to determine ¹³⁷Cs activity weighting factors.

Operational map: Satellite map for spatial analysis of contamination heterogeneity.

Pollen analysis protocols: Standardized procedures for melissopalynological examination (focused on elevated ¹³⁷Cs in plant species within honey).

Generated for the study:

Bioindication of radioactive contamination by honey bees in the Bryansk and Rostov regions: Foraging dynamics of ¹³⁷Cs and ⁴⁰K in the plant–bee–bee product pathway (Sorokin, 2025).
Data from: LIGHT RAIN CHARACTERIZATION IN PIRACICABA, SÃO PAULO STATE,...
scielo.figshare.com
tiff
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabio T. Johanson; Asdrubal J. Farias-Ramirez; Marco A. Jacomazzi; Sergio N. Duarte; Maria A. Moreno-Pizani (2023). LIGHT RAIN CHARACTERIZATION IN PIRACICABA, SÃO PAULO STATE, BRAZIL [Dataset]. http://doi.org/10.6084/m9.figshare.22187844.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22187844.v1
Dataset updated
Jun 10, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Fabio T. Johanson; Asdrubal J. Farias-Ramirez; Marco A. Jacomazzi; Sergio N. Duarte; Maria A. Moreno-Pizani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil, Piracicaba, State of São Paulo
Description
ABSTRACT The disorderly soil occupation without the necessary conservationist practices leads to impacts on the local hydrology and induces the pollution of water resources. This pollution may come from more urbanized areas due to the amount of pollutants drained during the rains. Even moderate precipitations constitute one of the main factors that define pollutant runoff on the surface. These rains have recently been called light rains. Light rains have a lower precipitation height and a higher frequency compared to classic rains of drainage projects, being necessary to define them according to patterns of rain frequency for each region. This study aimed to characterize light rain in the municipality of Piracicaba to establish statistical standards for the frequency of certain precipitation heights. A database provided by the ESALQ/USP automatic weather station, which provides precipitation measurements every 15 minutes, was used in the present study. Light rain heights reached 40.3, 41.4, and 42.7 mm for 100, 90, or 80% frequencies, respectively, which implies the use of return periods of 1.00, 1.11, and 1.25 years, respectively.
OpenAQ
kaggle.com
zip
Updated Dec 1, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open AQ (2017). OpenAQ [Dataset]. https://www.kaggle.com/datasets/open-aq/openaq
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Dec 1, 2017
Dataset authored and provided by
Open AQ
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.

Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).

Update Frequency: Weekly

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.openaq.[TABLENAME]. Fork this kernel to get started.

Acknowledgements

Dataset Source: openaq.org

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source and is provided "AS IS" without any warranty, express or implied.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr

Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations"

Explore at:

Dataset updated

Nov 7, 2021

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).

Clear search

Close search

Google apps

Main menu

Datasets for manuscript "A data engineering framework for chemical flow...

World's Air Quality and Water Pollution Dataset

Eaton Fire Resident's United: Pre-Remediation Indoor Contamination Test...

Scope of this Dataset

Contents

Datasheet for Dataset

Standard Operating Procedures

Code Examples

Data

Updates

Acknowledgements

GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over...

Spectral inclusion and pollution for a class of dissipative perturbations -...

Indian-Air-pollution-Analysis-using-Python

Data for: Bioindication of Radioactive Contamination by Honey Bees in the...

Data from: LIGHT RAIN CHARACTERIZATION IN PIRACICABA, SÃO PAULO STATE,...

OpenAQ

Querying BigQuery tables

Acknowledgements

Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations"