Facebook
TwitterWater quality data. This dataset is associated with the following publication: Papenfus, M., B. Schaeffer, A. Pollard, and K. Loftin. Exploring the potential value of satellite remote sensing to monitor chorophyll-a for U.S. lakes and reservoirs.. ENVIRONMENTAL MONITORING AND ASSESSMENT. Springer, New York, NY, USA, 192: 808, (2020).
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Long-term freshwater quality data from federal and federal-provincial sampling sites throughout Canada's aquatic ecosystems are included in this dataset. Measurements regularly include physical-chemical parameters such as temperature, pH, alkalinity, major ions, nutrients and metals. Collection includes data from active sites, as well as historical sites that have a period of record suitable for trend analysis. Sampling frequencies vary according to monitoring objectives. The number of sites in the network varies slightly from year-to-year, as sites are adjusted according to a risk-based adaptive management framework. The Great Lakes are sampled on a rotation basis and not all sites are sampled every year. Data are collected to meet federal commitments related to transboundary watersheds (rivers and lakes crossing international, inter-provincial and territorial borders) or under authorities such as the Department of the Environment Act, the Canada Water Act, the Canadian Environmental Protection Act, 1999, the Federal Sustainable Development Strategy, or to meet Canada's commitments under the 1969 Master Agreement on Apportionment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Waterbase is the generic name given to the EEA's databases on the status and quality of Europe's rivers, lakes, groundwater bodies and transitional, coastal and marine waters, on the quantity of Europe's water resources, and on the emissions to surface waters from point and diffuse sources of pollution. The dataset contains time series of nutrients, organic matter, hazardous substances and other chemicals in rivers, lakes and groundwater, as well as data on biological quality elements (BQEs) such as phytobenthos and macroinvertebrates in rivers and lakes. A list of monitoring site identifiers with selected attributes, reported through WFD and WISE Spatial data reporting, is added to dataset as spatial reference. The data has been compiled and processed by EEA. Please refer to the metadata for additional information.
Facebook
TwitterThe ZTRAX data is a national database of property sales collected by Zillow. The data is available to researchers who submit a research proposal to Zillow. Portions of this dataset are inaccessible because: Not publicly available. They can be accessed through the following means: Requires a data sharing agreement with Zillow. Format: National property sales database https://www.zillow.com/research/ztrax/. This dataset is associated with the following publication: Mamun, S., A. Castillo, K. Swedberg, J. Zhang, K.J. Boyle, D. Cardoso, C.L. King, C. Nolte, M. Papenfus, D. Phaneuf, and S. Polasky. Valuing water quality in the United States using a national dataset on property values. PNAS (PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES). National Academy of Sciences, WASHINGTON, DC, USA, 120(5): e2210417120, (2023).
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The California Department of Water Resources (DWR) discrete (vs. continuous) water quality datasets contains DWR-collected, current and historical, chemical and physical parameters found in routine environmental, regulatory compliance monitoring, and special studies throughout the state.
Facebook
TwitterAttribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
This dataset provides water quality data from stations in the Republic of Korea.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset comprises a collection of records detailing the water quality across different cities, regions, and countries. Each entry contains information regarding the city, region, and country where the water sample was taken. Additionally, the dataset records various water quality parameters, including air quality (AirQuality), water pollution (WaterPollution), pH level (ph), water hardness (Hardness), soluble solids content (Solids), chloramines concentration (Chloramines), sulfate levels (Sulfate), conductivity (Conductivity), organic carbon content (Organic_carbon), trihalomethanes concentration (Trihalomethanes), turbidity (Turbidity), and potability status (Potability) of the water. By analyzing this dataset, we can explore the relationships between various factors and draw valuable conclusions regarding water quality and potability across different locations worldwide.
Facebook
TwitterThis dataset offers a detailed look at drinking water contaminant levels for 4,446 major water utilities across the U.S., with each entry including geolocation and population served.
The analysis focuses on ten key contaminants, mainly tracking Disinfection Byproducts (DBPs), which are common issues in municipal water systems.
The full list of contaminants covered in the dataset includes:
Columns related to contaminants indicate how often levels exceed allowable limits in tap water. 💧🚰
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
(please upvote if you found this useful) Water is a fundamental resource for life, and its quality is a critical factor for public health, environmental sustainability, and economic development. In a country as vast and diverse as India, monitoring the quality of water bodies is a monumental yet essential task. This dataset provides a valuable snapshot of the state of various water bodies across India, offering insights into the levels of different pollutants and key water quality parameters.
The data is sourced from the Central Pollution Control Board (CPCB) of India, which is the national organization responsible for the prevention and control of water and air pollution. This dataset is a part of their ongoing efforts to monitor the health of India's water resources under the National Water Quality Monitoring Programme (NWMP). Excellent! Here's a well-structured description for your Kaggle dataset, incorporating the context, content, and your inspiration. You can copy and paste this directly into the description section on Kaggle.
Context Water is a fundamental resource for life, and its quality is a critical factor for public health, environmental sustainability, and economic development. In a country as vast and diverse as India, monitoring the quality of water bodies is a monumental yet essential task. This dataset provides a valuable snapshot of the state of various water bodies across India, offering insights into the levels of different pollutants and key water quality parameters.
The data is sourced from the Central Pollution Control Board (CPCB) of India, which is the national organization responsible for the prevention and control of water and air pollution. This dataset is a part of their ongoing efforts to monitor the health of India's water resources under the National Water Quality Monitoring Programme (NWMP).
Content This dataset contains water quality data collected from various monitoring stations across 17 different states in India between the years 2021 and 2023. The data covers different types of water bodies, primarily Rivers and Drains.
The dataset includes a rich set of parameters that are crucial for assessing water quality, including:
Physical Parameters: Temperature
Chemical Parameters: pH, Conductivity, Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), and Nitrate-N
Biological Parameters: Fecal Coliform and Total Coliform
Each row in the dataset represents a specific monitoring location and provides the minimum and maximum recorded values for these parameters over a given year.
Inspiration and Acknowledgements This dataset was compiled out of a need for comprehensive and recent water quality data for a project focused on water quality analysis and building a recommendation system. The goal is to make this valuable data more accessible to the data science community for analysis, visualization, and the development of predictive models.
Facebook
TwitterFind data on drinking water quality in Massachusetts. This dataset shows drinking water exceedances for lead by Community Water System and year of exceedance in Massachusetts.
Facebook
TwitterThis dataset, titled Water Quality Analysis, is composed of 7,999 records and 21 columns. It offers a comprehensive set of parameters relevant to water quality assessment, making it ideal for environmental studies, public health research, and machine learning applications related to water safety and treatment processes.
Rows: 7,999 Columns: 21 Features: Chemical concentrations: Includes various chemicals like aluminium, ammonia, arsenic, barium, cadmium, chloramine, chromium, copper, flouride, lead, nitrates, nitrites, mercury, perchlorate, radium, selenium, silver, and uranium. Biological parameters: bacteria and viruses, indicating the presence of biological contaminants. Safety Indicator: is_safe (1 for safe, 0 for not safe), indicating whether the water quality is considered safe for consumption based on the measured parameters.
This dataset is suitable for:
Predictive modeling to determine water safety. Analysis of correlations between various contaminants. Environmental and public health research. Educational purposes in chemistry and environmental sciences.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides an extensive record of water quality sampling activities conducted in early 2025. It can be used for environmental monitoring, regulatory compliance assessment, pollution analysis, and machine learning models related to water quality prediction. The dataset includes over 81,000 samples, covering various determinands with diverse concentration ranges. The most common sampling source is rivers/running surface water, and ammonia levels are a key monitored parameter.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data DescriptionWater Quality Parameters: Ammonia, BOD, DO, Orthophosphate, pH, Temperature, Nitrogen, Nitrate.Countries/Regions: United States, Canada, Ireland, England, China.Years Covered: 1940-2023.Data Records: 2.82 million.Definition of ColumnsCountry: Name of the water-body region.Area: Name of the area in the region.Waterbody Type: Type of the water-body source.Date: Date of the sample collection (dd-mm-yyyy).Ammonia (mg/l): Ammonia concentration.Biochemical Oxygen Demand (BOD) (mg/l): Oxygen demand measurement.Dissolved Oxygen (DO) (mg/l): Concentration of dissolved oxygen.Orthophosphate (mg/l): Orthophosphate concentration.pH (pH units): pH level of water.Temperature (°C): Temperature in Celsius.Nitrogen (mg/l): Total nitrogen concentration.Nitrate (mg/l): Nitrate concentration.CCME_Values: Calculated water quality index values using the CCME WQI model.CCME_WQI: Water Quality Index classification based on CCME_Values.Data Directory Description:Category 1: DatasetCombined Data: This folder contains two CSV files: Combined_dataset.csv and Summary.xlsx. The Combined_dataset.csv file includes all eight water quality parameter readings across five countries, with additional data for initial preprocessing steps like missing value handling, outlier detection, and other operations. It also contains the CCME Water Quality Index calculation for empirical analysis and ML-based research. The Summary.xlsx provides a brief description of the datasets, including data distributions (e.g., maximum, minimum, mean, standard deviation).Combined_dataset.csvSummary.xlsxCountry-wise Data: This folder contains separate country-based datasets in CSV files. Each file includes the eight water quality parameters for regional analysis. The Summary_country.xlsx file presents country-wise dataset descriptions with data distributions (e.g., maximum, minimum, mean, standard deviation).England_dataset.csvCanada_dataset.csvUSA_dataset.csvIreland_dataset.csvChina_dataset.csvSummary_country.xlsxCategory 2: CodeData processing and harmonization code (e.g., Language Conversion, Date Conversion, Parameter Naming and Unit Conversion, Missing Value Handling, WQI Measurement and Classification).Data_Processing_Harmonnization.ipynbThe code used for Technical Validation (e.g., assessing the Data Distribution, Outlier Detection, Water Quality Trend Analysis, and Vrifying the Application of the Dataset for the ML Models).Technical_Validation.ipynbCategory 3: Data Collection SourcesThis category includes links to the selected dataset sources, which were used to create the dataset and are provided for further reconstruction or data formation. It contains links to various data collection sources.DataCollectionSources.xlsxOriginal Paper Title: A Comprehensive Dataset of Surface Water Quality Spanning 1940-2023 for Empirical and ML Adopted ResearchAbstractAssessment and monitoring of surface water quality are essential for food security, public health, and ecosystem protection. Although water quality monitoring is a known phenomenon, little effort has been made to offer a comprehensive and harmonized dataset for surface water at the global scale. This study presents a comprehensive surface water quality dataset that preserves spatio-temporal variability, integrity, consistency, and depth of the data to facilitate empirical and data-driven evaluation, prediction, and forecasting. The dataset is assembled from a range of sources, including regional and global water quality databases, water management organizations, and individual research projects from five prominent countries in the world, e.g., the USA, Canada, Ireland, England, and China. The resulting dataset consists of 2.82 million measurements of eight water quality parameters that span 1940 - 2023. This dataset can support meta-analysis of water quality models and can facilitate Machine Learning (ML) based data and model-driven investigation of the spatial and temporal drivers and patterns of surface water quality at a cross-regional to global scale.Note: Cite this repository and the original paper when using this dataset.
Facebook
TwitterThe Water Quality Portal (WQP) is a cooperative service sponsored by the United States Geological Survey (USGS), the Environmental Protection Agency (EPA), and the National Water Quality Monitoring Council (NWQMC). It serves data collected by over 400 state, federal, tribal, and local agencies. Water quality data can be downloaded in Excel, CSV, TSV, and KML formats. Fourteen site types are found in the WQP: aggregate groundwater use, aggregate surface water use, atmosphere, estuary, facility, glacier, lake, land, ocean, spring, stream, subsurface, well, and wetland. Water quality characteristic groups include physical conditions, chemical and bacteriological water analyses, chemical analyses of fish tissue, taxon abundance data, toxicity data, habitat assessment scores, and biological index scores, among others. Within these groups, thousands of water quality variables registered in the EPA Substance Registry Service (https://iaspub.epa.gov/sor_internet/registry/substreg/home/overview/home.do) and the Integrated Taxonomic Information System (https://www.itis.gov/) are represented. Across all site types, physical characteristics (e.g., temperature and water level) are the most common water quality result type in the system. The Water Quality Exchange data model (WQX; http://www.exchangenetwork.net/data-exchange/wqx/), initially developed by the Environmental Information Exchange Network, was adapted by EPA to support submission of water quality records to the EPA STORET Data Warehouse [USEPA, 2016], and has subsequently become the standard data model for the WQP. Contributing organizations: ACWI The Advisory Committee on Water Information (ACWI) represents the interests of water information users and professionals in advising the federal government on federal water information programs and their effectiveness in meeting the nation's water information needs. ARS The Agricultural Research Service (ARS) is the U.S. Department of Agriculture's chief in-house scientific research agency, whose job is finding solutions to agricultural problems that affect Americans every day, from field to table. ARS conducts research to develop and transfer solutions to agricultural problems of high national priority and provide information access and dissemination to, among other topics, enhance the natural resource base and the environment. Water quality data from STEWARDS, the primary database for the USDA/ARS Conservation Effects Assessment Project (CEAP) are ingested into WQP via a web service. EPA The Environmental Protection Agency (EPA) gathers and distributes water quality monitoring data collected by states, tribes, watershed groups, other federal agencies, volunteer groups, and universities through the Water Quality Exchange framework in the STORET Warehouse. NWQMC The National Water Quality Monitoring Council (NWQMC) provides a national forum for coordination of comparable and scientifically defensible methods and strategies to improve water quality monitoring, assessment, and reporting. It also promotes partnerships to foster collaboration, advance the science, and improve management within all elements of the water quality monitoring community. USGS The United States Geological Survey (USGS) investigates the occurrence, quantity, quality, distribution, and movement of surface waters and ground waters and disseminates the data to the public, state, and local governments, public and private utilities, and other federal agencies involved with managing the United States' water resources. Resources in this dataset:Resource Title: Website Pointer for Water Quality Portal. File Name: Web Page, url: https://www.waterqualitydata.us/ The Water Quality Portal (WQP) is a cooperative service sponsored by the United States Geological Survey (USGS), the Environmental Protection Agency (EPA), and the National Water Quality Monitoring Council (NWQMC). It serves data collected by over 400 state, federal, tribal, and local agencies. Links to Download Data, User Guide, Contributing Organizations, National coverage by state.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Historical water quality data measured on a continuous basis at over 23 locations across Canada is included in this dataset. Most locations include hourly temperature, dissolved oxygen, pH, specific conductance and turbidity. Data are collected by Environment and Climate Change Canada (ECCC) and in partnership with other federal departments and provinces and territories to enable the detection of short-term water quality events, and to determine trends in water quality, especially at transboundary sites (or Federal waters) in support of Federal legislation and international agreements, or to report on the status of Government of Canada priority aquatic ecosystems.
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
The Environmental Department releases river water quality monitoring data, including River Pollution Index (RPI) and monitored values of major pollutants. Due to the need for monthly on-site sampling, laboratory testing and data quality control procedures, monitoring data is usually provided every other month.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Water is the most precious and essential resource among all-natural resources. Some organism survives without oxygen and food such as Tardigrades. But no one can survive without water. The increase in the development of industries and human activities over the previous century is having an overwhelming impact on our environment. Most cities in the world have started to implement the aqua management system. The development of cloud computing, artificial intelligence, remote sensing, big data and the Internet of Things provide new opening and move toward the improvement and application of aqua resource monitoring system. For predicting water quality of rivers, dams and lakes in India, water quality parameter dataset is created. The name of the data set is Aquaattributes. Completely 1360 samples are presented in the Aquaattributes. The data set size is 190 KB. Attributes of the dataset location name along with its longitude and latitude values and water quality parameters.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset explores the relationship between water pollution and the prevalence of waterborne diseases worldwide. It includes water quality indicators, pollution levels, disease rates, and socio-economic factors that influence health outcomes. The dataset provides information on different countries and regions, spanning the years 2000-2025.
It covers key factors such as contaminant levels, access to clean water, bacterial presence, water treatment methods, sanitation coverage, and the incidence of diseases like diarrhea, cholera, and typhoid. Additionally, it incorporates socio-economic variables such as GDP per capita, urbanization rate, and healthcare access, which help assess the broader impact of water pollution on communities.
This dataset can be used for:
Public health research on the impact of water pollution.
Environmental studies to analyze trends in water contamination.
Policy-making for clean water access and sanitation improvements.
Machine learning models to predict disease outbreaks based on water quality.
Prevalence: Covers 10 countries (e.g., USA, India, China, Brazil, Nigeria, Bangladesh, Mexico, Indonesia, Pakistan, Ethiopia).
Includes 5 regions per country (e.g., North, South, East, West, Central).
Spans 26 years (2000-2025).
Features 3,000 unique records representing various water sources and pollution conditions.
Facebook
TwitterData collected to fulfill the requirements of the SWTR (Surface Water Treatment Rule) and FAD (Filtration Avoidance Determination). Data is collected via grab sampling, analysis, LIMS data capture and reporting. Each record represents either a four hour turbidity result, a 24 hour average turbidty result, or a daily fecal coliform result from DEL18DT (Delaware Shaft 18 downtake). Data is used to monitor compliance with the requirements above. There are no limitations for the data.
Facebook
Twitterhttps://catalog.dvrpc.org/dvrpc_data_license.htmlhttps://catalog.dvrpc.org/dvrpc_data_license.html
The federal Clean Water Act was established to restore and maintain the chemical, physical, and biological integrity of the nation's waters. Water quality standards have been established by federal and state governments to ensure that waterbodies attain their designated uses. Designated uses include human uses and ecological conditions: general aquatic life, trout, recreation, drinking water supply, industrial water supply, agricultural water supply, shellfish harvesting, and fish consumption.
As mandated by the Clean Water Act, surface water quality in all states is monitored and assessed every two years. During this time, government-employed scientists take samples of water at various waterbody sites and test them to determine whether or not that waterbody has attained its designated use(s). The designated use of general aquatic life is the most indicative of overall surface water quality and is the most comprehensively monitored across the region. Therefore, aquatic life is used as the indicator of regional water quality.
Water quality in Pennsylvania is assessed based on stream segments. Attainment (or lack of attainment) is determined by analyzing the health of aquatic macroinvertebrates (i.e. insect larvae, crayfish, clams, snails, worms) present in the stream. Pennsylvania's Department of Environmental Protection's (PADEP) assessment plan covers the entire state in 10-year increments. Interim evaluations are performed using targeted sampling in each of the state's major subwatersheds every two years. New Jersey Department of Environmental Protection (NJDEP), on the other hand, assigns attainment or lack of attainment to entire subwatersheds (land area). Similar to PADEP, this determination is based on in-stream sampling of macroinvertebrates. New Jersey's most recent report for 2014 is based on data collected between 2008 and 2012.
Since the two states do not report water quality data using the same criteria (stream miles in Pennsylvania versus acres of subwatershed in New Jersey), the percentage of non-attaining water(s) in each state is taken according to its preferred unit, and then the two percentages are averaged together to obtain a regional value.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The California Department of Water Resources (DWR) discrete (vs. continuous) water quality datasets contains DWR-collected, current and historical, chemical and physical parameters found in routine environmental, regulatory compliance monitoring, and special studies throughout the state.