Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This indicator shows how many days per year were assessed to have air quality that was worse than “moderate” in Champaign County, according to the U.S. Environmental Protection Agency’s (U.S. EPA) Air Quality Index Reports. The period of analysis is 1980-2024, and the U.S. EPA’s air quality ratings analyzed here are as follows, from best to worst: “good,” “moderate,” “unhealthy for sensitive groups,” “unhealthy,” “very unhealthy,” and "hazardous."[1]
In 2024, the number of days rated to have air quality worse than moderate was 0. This is a significant decrease from the 13 days in 2023 in the same category, the highest in the 21st century. That figure is likely due to the air pollution created by the unprecedented Canadian wildfire smoke in Summer 2023.
While there has been no consistent year-to-year trend in the number of days per year rated to have air quality worse than moderate, the number of days in peak years had decreased from 2000 through 2022. Where peak years before 2000 had between one and two dozen days with air quality worse than moderate (e.g., 1983, 18 days; 1988, 23 days; 1994, 17 days; 1999, 24 days), the year with the greatest number of days with air quality worse than moderate from 2000-2022 was 2002, with 10 days. There were several years between 2006 and 2022 that had no days with air quality worse than moderate.
This data is sourced from the U.S. EPA’s Air Quality Index Reports. The reports are released annually, and our period of analysis is 1980-2024. The Air Quality Index Report websites does caution that "[a]ir pollution levels measured at a particular monitoring site are not necessarily representative of the air quality for an entire county or urban area," and recommends that data users do not compare air quality between different locations[2].
[1] Environmental Protection Agency. (1980-2024). Air Quality Index Reports. (Accessed 13 June 2025).
[2] Ibid.
Source: Environmental Protection Agency. (1980-2024). Air Quality Index Reports. https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report. (Accessed 13 June 2025).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Toxicological Effect and Risk Assessment (TERA) Knowledge Graph is based on chemical effect data from U.S. EPA ECOTOX. This data is aligned to non-proprietary identifiers using ontology alignment tools and external sources (eg. wikidata). This enables the use of external chemical knowledge graphs (eg. ChEBI, PubChem). This data set also included an aggregated (into a knowledge graph) of the NCBI taxonomy and Encyclopedia of Life traits data.
Linking ECOTOX to external sources enables the extrapolation of effect data, which can extend the reach of ecological risk assessment and limit laboratory experiments.
The construction and APIs to facilitate access can be found at: https://gitlab.com/Erik-BM/rappt
A promising application of the knowledge graph is chemical effect prediction. This work can be found here: https://github.com/Erik-BM/NIVAUC
Files:
This tool to gives you access to greenhouse gas data reported to EPA by large facilities and suppliers in the United States through EPA's Greenhouse Gas Reporting Program. The tool allows you to view data in several formats including maps, tables, charts and graphs for individual facilities or groups of facilities. You can search the data set for individual facilities by name or location or filter the data set by state or county, industry sectors and sub-sectors, annual facility emission thresholds, and greenhouse gas type. For more information on the GHG Reporting Program and this data, please visit https://www.epa.gov/ghgreporting
This is a 1:2,000,000 coverage of streams for the conterminous United States. This coverage was intended for use as a background display for the National Water Summary program.
The stream layer was extracted from the 1:2,000,000 Digital Line Graph files. Originally, each state was stored as a separate coverage. In this version, the individual state coverages all have been appended.
[Summary provided by EPA]
description: The State Authorization Tracking System (StATS) is an information management system designed to document the progress of each state and territory in establishing and maintaining RCRA-authorized hazardous waste management programs. StATS tracks the status of each state with regard to changes made to the federal hazardous waste regulations. The pages listed at the website show state authorization and adoption information for RCRA Subtitle C hazardous waste rules. Adoption information is based on data received from EPA regional offices. Currently, state authorization and adoption percentages are based on the required rules promulgated through RCRA Cluster XXII. Published federal register notices are the only legal mechanism by which EPA grants authorization to the states. If any of the information contained in the StATS database conflicts with information stated in the Federal Register, the Federal Register information will take precedence. We strongly recommend that the regulated community contact their state government office for hazardous waste regulatory information.; abstract: The State Authorization Tracking System (StATS) is an information management system designed to document the progress of each state and territory in establishing and maintaining RCRA-authorized hazardous waste management programs. StATS tracks the status of each state with regard to changes made to the federal hazardous waste regulations. The pages listed at the website show state authorization and adoption information for RCRA Subtitle C hazardous waste rules. Adoption information is based on data received from EPA regional offices. Currently, state authorization and adoption percentages are based on the required rules promulgated through RCRA Cluster XXII. Published federal register notices are the only legal mechanism by which EPA grants authorization to the states. If any of the information contained in the StATS database conflicts with information stated in the Federal Register, the Federal Register information will take precedence. We strongly recommend that the regulated community contact their state government office for hazardous waste regulatory information.
ADAM-Data-Repository This repository contains all the data needed to run the case studies for the ADAM manuscript. Biogas production The directory "biogas" contains all data for the biogas production case studies (Figs 13 and 14). Specifically, "biogas/biogas_x" contains the data files for the scenario where "x" is the corresponding Renewable Energy Certificates (RECs) value. Plastic waste recycling The directory "plastic_waste" contains all data for the plastic waste recycling case studies (Figs 15 and 16). Different scenarios share the same supply, technology site, and technology candidate data, as specified by the "csv" files under "plastic_waste". Each scenario has a different demand data file, which is contained in "plastic_waste/Elec_price" and "plastic_waste/PET_price". How to run the case studies In order to run the case studies, one can create a new model in ADAM and upload appropriate CSV file at each step (e.g. upload biogas/biogas_0/supplydata197.csv in step 2 where supply data are specified). This dataset is associated with the following publication: Hu, Y., W. Zhang, P. Tominac, M. Shen, D. Göreke, E. Martín-Hernández, M. Martín, G.J. Ruiz-Mercado, and V.M. Zavala. ADAM: A web platform for graph-based modeling and optimization of supply chains. COMPUTERS AND CHEMICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 165: 107911, (2022).
Students are introduced to measuring and identifying sources of air pollution, as well as how environmental engineers try to control and limit the amount of air pollution. In Part 1, students are introduced to nitrogen dioxide as an air pollutant and how it is quantified. Major sources are identified, using EPA bar graphs. Students identify major cities and determine their latitudes and longitudes. They estimate NO2 values from color maps showing monthly NO2 averages from two sources: a NASA satellite and the WSU forecast model AIRPACT. In Part 2, students continue to estimate NO2 values from color maps and use Excel to calculate differences and ratios to determine the model's performance. They gain experience working with very large numbers written in scientific notation, as well as spreadsheet application capabilities.
Hydrology Graphs This repository contains the code for the manuscript "A Graph Formulation for Tracing Hydrological Pollutant Transport in Surface Waters." There are three main folders containing code and data, and these are outlined below. We call the framework for building a graph of these hydrological systems "Hydrology Graphs". Several of the datafiles for building this framework are large and cannot be stored on Github. To conserve space, the notebook get_and_unpack_data.ipynb or the script get_and_unpack_data.py can be used to download the data from the Watershed Boundary Dataset (WBD), the National Hydrography Dataset (NHDPlusV2), and the agricultural land dataset for the state of Wisconsin. The files WILakes.df and WIRivers.df metnioend in section 1 below are contained within the WI_lakes_rivers.zip folder, and the files 24k Hydro Waterbodies dataset are contained in a zip file under the directory DNR_data/Hydro_Waterbodies. These files can also be unpacked by running the corresponding cells in the notebook get_and_unpack_data.ipynb or get_and_unpack_data.py. 1. graph_construction This folder contains the data and code for building a graph of the watershed-river-waterbody hydrological system. It uses data from the Watershed Boundary Dataset (link here) and the National Hydrography Dataset (link here) as a basis and builds a list of directed edges. We use NetworkX to build and visualize the list as a graph. case_studies This folder contains three .ipynb files for three separate case studies. These three case studies focus on how "Hydrology Graphs" can be used to analyze pollutant impacts in surface waters. Details of these case studies can be found in the manuscript above. DNR_data This folder contains data from the Wisconsin Department of Natural Resources (DNR) on water quality in several Wisconsin lakes. The data was obtained from here using the file Web_scraping_script.py. The original downloaded reports are found in the folder original_lake_reports. These reports were then cleaned and reformatted using the script DNR_data_filter.ipynb. The resulting, cleaned reports are found in the Lakes folder. Each subfolder of the Lakes folder contains data for a single lake. The two .csvs lake_index_WBIC.csv contain an index for what lake each numbered subfolder corresponds. In addition, we added the corresponding COMID in lake_index_WBIC_COMID.csv by matching the NHDPlusV2 data to the Wisconsin DNR's 24k Hydro Waterbodies dataset which we downloaded from here. The DNR's reported data only matches lakes to a waterbody identification code (WBIC), so we use HYDROLakes (indexed by WBIC) to match to the COMID. This is done in the DNR_data_filter.ipynb script as well. Python Versions The .py files in graph_construction/ were run using Python version 3.9.7. The scripts used the following packages and version numbers: geopandas (0.10.2) shapely (1.8.1.post1) tqdm (4.63.0) networkx (2.7.1) pandas (1.4.1) numpy (1.21.2). This dataset is associated with the following publication: Cole, D.L., G.J. Ruiz-Mercado, and V.M. Zavala. A graph-based modeling framework for tracing hydrological pollutant transport in surface waters. COMPUTERS AND CHEMICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 179: 108457, (2023).
https://edg.epa.gov/EPA_Data_License.htmhttps://edg.epa.gov/EPA_Data_License.htm
The Quick Facts and Trends module is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Quick Facts and Trends module provides charts and graphs depicting national trends in emissions and heat input. The user can view, for example, data pertaining to the top annual and ozone season emitters of a selected pollutant, the number of units and facilities in a particular state, and trends in sulfur dioxide, nitrogen oxide and carbon dioxide emissions.
EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).
Data corresponding to graphs and some tables paper. This dataset is associated with the following publication: Magnuson, M., T. Stilman, S. Serre, J. Archer, R. James, X. Xiaoyan, M. Lawrence, E. Tamargo, H. Raveh-Amit, and A. Sharon. Part 2: Stabilization/Containment of Radiological Particle Contamination to Enhance First Responder, Early Phase Worker, and Public Safety. Applied Sciences. MDPI, Basel, SWITZERLAND, 12(8): 3861, (2022).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is used in Course 4 - Exploratory Graphs lesson of the Data Science Specialization from Johns Hopkins Bloomberg School of Public Health at Coursera. Annual average PM2.5 averaged over the period 2008 through 2010 in the USA. Data provided by the U.S. Environmental Protection Agency (EPA).
This is the PFAS concentrations in the various samples analyzed. These concentrations were used to creat the various graphs and tables in the associated manuscript. This dataset is associated with the following publication: Chen, Y., H. Zhang, Y. Liu, J. Bowden, T. Tolaymat, T. Townsend, and H. Solo-Gabriele. Evaluation of Per- and Polyfluoroalkyl Substances (PFAS) in Leachate, Gas Condensate, Stormwater and Groundwater at Landfills. CHEMOSPHERE. Elsevier Science Ltd, New York, NY, USA, 318: 137903, (2023).
The dataset includes all data used in the creation of figures and graphs in the paper: "Scenarios for low carbon and low water electric power plant operations: implications for upstream water use." Data includes regional electricity mixes, full life cycle water use, and water use for each life cycle stage. These encompass a range of scenarios out to 2050, and should not be used as predictions, forecasts or official baselines. The scenarios and results are for research purposes only, and do not represent current or future U.S. EPA policies or regulations. This dataset is associated with the following publication: Dodder , R., J. Barnwell , and W. Yelverton. Scenarios for low carbon and low water electric power plant operations: implications for upstream water use. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 50(21): 11460-11470, (2016).
The dataset contains the raw data for the graphs in the paper. This dataset is associated with the following publication: Smith, M., S. Stuntz, Y. Xing, M. Magnuson, R. Phillips, and W.F. Harper. Functional resilience of activated sludge exposed to Bacillus globigii and bacteriophage MS2. Water and Environment Journal. John Wiley & Sons, Inc., Hoboken, NJ, USA, 35(3): 930-936, (2021).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Quantitative structure–activity relationships (QSAR) are introduced to predict acute oral toxicity (AOT), by using the QuBiLS-MAS (acronym for quadratic, bilinear and N-Linear maps based on graph-theoretic electronic-density matrices and atomic weightings) framework for the molecular encoding. Three training sets were employed to build the models: EPA training set (5931 compounds), EPA-full training set (7413 compounds), and Zhu training set (10 152 compounds). Additionally, the EPA test set (1482 compounds) was used for the validation of the QSAR models built on the EPA training set, while the ProTox (425 compounds) and T3DB (284 compounds) external sets were employed for the assessment of all the models. The k-nearest neighbor, multilayer perceptron, random forest, and support vector machine procedures were employed to build several base (individual) models. The base models with REPA–training ≥ 0.75 (R = correlation coefficient) and MAEEPA–training ≤ 0.5 (MAE = mean absolute error) were retained to build consensus models. As a result, two consensus models based on the minimum operator and denoted as M19 and M22, as well as a consensus model based on the weighted average operator and denoted as M24, were selected as the best ones for each training set considered. According to the applicability domain (AD) analysis performed, model M19 (built on the EPA training set) has MAEtest–AD = 0.4044, MAEProTox–AD = 0.4067 and MAET3DB–AD = 0.2586 on the EPA test set, ProTox external set, and T3DB external set, respectively; whereas model M22 (built on the EPA-full set) and model M24 (built on the Zhu set) present MAEProTox–AD = 0.3992 and MAET3DB–AD = 0.2286, and MAEProTox–AD = 0.3773 and MAET3DB–AD = 0.2471 on the two external sets accounted for, respectively. These outcomes were compared and statistically validated with respect to 14 QSAR methods (e.g., admetSAR, ProTox-II) from the literature. As a result, model M22 presents the best overall performance. In addition, a retrospective study on 261 withdrawn drugs due to their toxic/side effects was performed, to assess the usefulness of prospectively using the QSAR models proposed in the labeling of chemicals. A comparison with regard to the methods from the literature was also made. As a result, model M22 has the best ability of labeling a compound as toxic according to the globally harmonized system of classification and labeling of chemicals. Therefore, it can be concluded that the models proposed, especially model M22, constitute prominent tools for studying AOT, at providing the best results among all the methods examined. A freely available software was also developed to be used in virtual screening tasks (http://tomocomd.com/apps/ptoxra).
The dataset contains the raw data for the graphs in the paper. This dataset is associated with the following publication: Phillips, R., R. James, and M. Magnuson. Functional categories of microbial toxicity resulting from three advanced oxidation process treatments during management and disposal of contaminated water. CHEMOSPHERE. Elsevier Science Ltd, New York, NY, USA, 238: 124550, (2020).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: Detecting water contamination in community housing is crucial for protecting public health. Early detection enables timely action to prevent waterborne diseases and ensures equitable access to safe drinking water. Traditional methods recommended by the Environmental Protection Agency (EPA) rely on collecting water samples and conducting lab tests, which can be both time-consuming and costly.Methods: To address these limitations, this study introduces a Graph Attention Network (GAT) to predict lead contamination in drinking water. The GAT model leverages publicly available municipal records and housing information to model interactions between homes and identify contamination patterns. Each house is represented as a node, and relationships between nodes are analyzed to provide a clearer understanding of contamination risks within the community.Results: Using data from Flint, Michigan, the model demonstrated higher performance compared to traditional methods. Specifically, the GAT achieved an accuracy of 0.80, precision of 0.71, and recall of 0.93, outperforming XGBoost, a classical machine learning algorithm, which had an accuracy of 0.70, precision of 0.66, and recall of 0.67.Discussion: In addition to its predictive capabilities, the GAT model identifies key factors contributing to lead contamination, enabling more precise targeting of at-risk areas. This approach offers a practical tool for policymakers and public health officials to assess and mitigate contamination risks, ultimately improving community health and safety.
The dataset contains the raw data for the graphs in the paper. This dataset is associated with the following publication: Rauglas, E., S. Martin, K. Bailey, C. Starr, M. Magnuson, R. Phillips, and W. Harper. The Effect of Malathion on the Activity, Performance, and Microbial Ecology of Activated Sludge- journal. JOURNAL OF ENVIRONMENTAL MANAGEMENT. Elsevier Science Ltd, New York, NY, USA, 220-228, (2016).
The dataset contains the raw data for the graphs in the paper. This dataset is associated with the following publication: Haupert, L., and M. Magnuson. Numerical Model for Decontamination of Organic Contaminants in Polyethylene Drinking Water Pipes in Premise Plumbing by Flushing. JOURNAL OF ENVIRONMENTAL ENGINEERING. American Society of Civil Engineers (ASCE), Reston, VA, USA, 145(7): 1-22, (2019).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The DMDT can be used to quantify and weigh the environmental, social and economic aspects of dredged material management options the user is considering. Once the options are evaluated, the tool can be used to communicate the evaluation process and results to decision makers and the community via spreadsheets and bar graphs.Download the DMDT and supplementary files, Internet Explorer or Firefox recommended for download.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This indicator shows how many days per year were assessed to have air quality that was worse than “moderate” in Champaign County, according to the U.S. Environmental Protection Agency’s (U.S. EPA) Air Quality Index Reports. The period of analysis is 1980-2024, and the U.S. EPA’s air quality ratings analyzed here are as follows, from best to worst: “good,” “moderate,” “unhealthy for sensitive groups,” “unhealthy,” “very unhealthy,” and "hazardous."[1]
In 2024, the number of days rated to have air quality worse than moderate was 0. This is a significant decrease from the 13 days in 2023 in the same category, the highest in the 21st century. That figure is likely due to the air pollution created by the unprecedented Canadian wildfire smoke in Summer 2023.
While there has been no consistent year-to-year trend in the number of days per year rated to have air quality worse than moderate, the number of days in peak years had decreased from 2000 through 2022. Where peak years before 2000 had between one and two dozen days with air quality worse than moderate (e.g., 1983, 18 days; 1988, 23 days; 1994, 17 days; 1999, 24 days), the year with the greatest number of days with air quality worse than moderate from 2000-2022 was 2002, with 10 days. There were several years between 2006 and 2022 that had no days with air quality worse than moderate.
This data is sourced from the U.S. EPA’s Air Quality Index Reports. The reports are released annually, and our period of analysis is 1980-2024. The Air Quality Index Report websites does caution that "[a]ir pollution levels measured at a particular monitoring site are not necessarily representative of the air quality for an entire county or urban area," and recommends that data users do not compare air quality between different locations[2].
[1] Environmental Protection Agency. (1980-2024). Air Quality Index Reports. (Accessed 13 June 2025).
[2] Ibid.
Source: Environmental Protection Agency. (1980-2024). Air Quality Index Reports. https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report. (Accessed 13 June 2025).