100+ datasets found

Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA...
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.
d
Addresses (Open Data)
catalog.data.gov
data-academy.tempe.gov
+11more
Updated Nov 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
Explore at:
Dataset updated
Nov 22, 2025
Dataset provided by
City of Tempe
Description
This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary
d
Replication Data for: Scaling Data from Multiple Sources
dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc (2023). Replication Data for: Scaling Data from Multiple Sources [Dataset]. http://doi.org/10.7910/DVN/FOUVEL
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FOUVEL
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc
Description
We introduce a method for scaling two data sets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives while recovering the words most associated with each senator's location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.
u
Data from: GALLO: An R package for Genomic Annotation and integration of...
portalcientifico.unileon.es
portalcienciaytecnologia.jcyl.es
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela; Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela (2020). GALLO: An R package for Genomic Annotation and integration of multiple data source in livestock for positional candidate LOci [Dataset]. https://portalcientifico.unileon.es/documentos/668fc461b9e7c03b01bdb93f
Explore at:
Dataset updated
2020
Authors
Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela; Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela
Description
The development of high-throughput sequencing and genotyping methodologies allowed the identification of thousands of genomic regions associated with several complex traits. The integration of multiple sources of biological information is a crucial step required to better understand patterns regulating the development of these traits. Genomic Annotation in Livestock for positional candidate LOci (GALLO) is an R package developed for the accurate annotation of genes and quantitative trait loci (QTLs) located in regions identified in common genomic analyses performed in livestock, such as Genome-Wide Association Studies and transcriptomics using RNA-Sequencing. Moreover, GALLO allows the graphical visualization of gene and QTL annotation results, data comparison among different grouping factors (e.g., methods, breeds, tissues, statistical models, studies, etc.), and QTL enrichment in different livestock species including cattle, pigs, sheep, and chickens, etc. Consequently, GALLO is a useful package for the annotation, identification of hidden patterns across datasets, datamining previously reported associations, as well as the efficient scrutinization of the genetic architecture of complex traits in livestock.
Binance Coin BNB, 1m Full Historical Data
kaggle.com
zip
Updated Oct 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imran Bukhari (2025). Binance Coin BNB, 1m Full Historical Data [Dataset]. https://www.kaggle.com/datasets/imranbukhari/comprehensive-bnbusd-1m-data/data
Explore at:
zip(266775584 bytes)Available download formats
Dataset updated
Oct 11, 2025
Authors
Imran Bukhari
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
I am a new developer and I would greatly appreciate your support. If you find this dataset helpful, please consider giving it an upvote!

Key Features:

Complete 1m Data: Raw 1m historical data from multiple exchanges, covering the entire trading history of BNBUSD available through their API endpoints. This dataset is updated daily to ensure up-to-date coverage.

Combined Index Dataset: A unique feature of this dataset is the combined index, which is derived by averaging all other datasets into one, please see attached notebook. This creates the longest continuous, unbroken BNBUSD dataset available on Kaggle, with no gaps and no erroneous values. It gives a much more comprehensive view of the market i.e. total volume across multiple exchanges.

Superior Performance: The combined index dataset has demonstrated superior 'mean average error' (MAE) metric performance when training machine learning models, compared to single-source datasets by a whole order of MAE magnitude.

Unbroken History: The combined dataset's continuous history is a valuable asset for researchers and traders who require accurate and uninterrupted time series data for modeling or back-testing.

https://i.imgur.com/aqtuPay.png" alt="BNBUSD Dataset Summary">

https://i.imgur.com/mnzs2f4.png" alt="Combined Dataset Close Plot"> This plot illustrates the continuity of the dataset over time, with no gaps in data, making it ideal for time series analysis.

Included Resources:

Two Notebooks:

Dataset Usage and Diagnostics: This notebook demonstrates how to use the dataset and includes a powerful data diagnostics function, which is useful for all time series analyses.

Aggregating Multiple Data Sources: This notebook walks you through the process of combining multiple exchange datasets into a single, clean dataset. (Currently unavailable, will be added shortly)
DataSheet2_Data Sources for Drug Utilization Research in Brazil—DUR-BRA...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes (2023). DataSheet2_Data Sources for Drug Utilization Research in Brazil—DUR-BRA Study.xlsx [Dataset]. http://doi.org/10.3389/fphar.2021.789872.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2021.789872.s002
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
Background: In Brazil, studies that map electronic healthcare databases in order to assess their suitability for use in pharmacoepidemiologic research are lacking. We aimed to identify, catalogue, and characterize Brazilian data sources for Drug Utilization Research (DUR).Methods: The present study is part of the project entitled, “Publicly Available Data Sources for Drug Utilization Research in Latin American (LatAm) Countries.” A network of Brazilian health experts was assembled to map secondary administrative data from healthcare organizations that might provide information related to medication use. A multi-phase approach including internet search of institutional government websites, traditional bibliographic databases, and experts’ input was used for mapping the data sources. The reviewers searched, screened and selected the data sources independently; disagreements were resolved by consensus. Data sources were grouped into the following categories: 1) automated databases; 2) Electronic Medical Records (EMR); 3) national surveys or datasets; 4) adverse event reporting systems; and 5) others. Each data source was characterized by accessibility, geographic granularity, setting, type of data (aggregate or individual-level), and years of coverage. We also searched for publications related to each data source.Results: A total of 62 data sources were identified and screened; 38 met the eligibility criteria for inclusion and were fully characterized. We grouped 23 (60%) as automated databases, four (11%) as adverse event reporting systems, four (11%) as EMRs, three (8%) as national surveys or datasets, and four (11%) as other types. Eighteen (47%) were classified as publicly and conveniently accessible online; providing information at national level. Most of them offered more than 5 years of comprehensive data coverage, and presented data at both the individual and aggregated levels. No information about population coverage was found. Drug coding is not uniform; each data source has its own coding system, depending on the purpose of the data. At least one scientific publication was found for each publicly available data source.Conclusions: There are several types of data sources for DUR in Brazil, but a uniform system for drug classification and data quality evaluation does not exist. The extent of population covered by year is unknown. Our comprehensive and structured inventory reveals a need for full characterization of these data sources.
Matched Sentinel-2 spectral data and chlorophyll a concentrations 2015-2020
catalog.data.gov
datasets.ai
Updated Sep 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Matched Sentinel-2 spectral data and chlorophyll a concentrations 2015-2020 [Dataset]. https://catalog.data.gov/dataset/matched-sentinel-2-spectral-data-and-chlorophyll-a-concentrations-2015-2020
Explore at:
Dataset updated
Sep 3, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The dataset includes Sentinel-2 spectral data for all bands spatiotemporally matched with available chlorophyll a concentration data from several data sources including the Water Quality Portal.
f
Data from: Multimorbidity in Australia: Comparing estimates derived using...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zwar, Nicholas; Jorm, Louisa; Lujic, Sanja; Hosseinzadeh, Hassan; Simpson, Judy M. (2017). Multimorbidity in Australia: Comparing estimates derived using administrative data sources and survey data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001779669
Explore at:
Dataset updated
Aug 29, 2017
Authors
Zwar, Nicholas; Jorm, Louisa; Lujic, Sanja; Hosseinzadeh, Hassan; Simpson, Judy M.
Area covered
Australia
Description
BackgroundEstimating multimorbidity (presence of two or more chronic conditions) using administrative data is becoming increasingly common. We investigated (1) the concordance of identification of chronic conditions and multimorbidity using self-report survey and administrative datasets; (2) characteristics of people with multimorbidity ascertained using different data sources; and (3) whether the same individuals are classified as multimorbid using different data sources.MethodsBaseline survey data for 90,352 participants of the 45 and Up Study—a cohort study of residents of New South Wales, Australia, aged 45 years and over—were linked to prior two-year pharmaceutical claims and hospital admission records. Concordance of eight self-report chronic conditions (reference) with claims and hospital data were examined using sensitivity (Sn), positive predictive value (PPV), and kappa (κ).The characteristics of people classified as multimorbid were compared using logistic regression modelling.ResultsAgreement was found to be highest for diabetes in both hospital and claims data (κ = 0.79, 0.78; Sn = 79%, 72%; PPV = 86%, 90%). The prevalence of multimorbidity was highest using self-report data (37.4%), followed by claims data (36.1%) and hospital data (19.3%). Combining all three datasets identified a total of 46 683 (52%) people with multimorbidity, with half of these identified using a single dataset only, and up to 20% identified on all three datasets. Characteristics of persons with and without multimorbidity were generally similar. However, the age gradient was more pronounced and people speaking a language other than English at home were more likely to be identified as multimorbid by administrative data.ConclusionsDifferent individuals, with different combinations of conditions, are identified as multimorbid when different data sources are used. As such, caution should be applied when ascertaining morbidity from a single data source as the agreement between self-report and administrative data is generally poor. Future multimorbidity research exploring specific disease combinations and clusters of diseases that commonly co-occur, rather than a simple disease count, is likely to provide more useful insights into the complex care needs of individuals with multiple chronic conditions.
h
Data from: VISEM
huggingface.co
Updated Jan 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sperm-net (2025). VISEM [Dataset]. https://huggingface.co/datasets/sperm-net/VISEM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2025
Dataset authored and provided by
Sperm-net
Description
Dataset Card for VISEM Dataset

Dataset Details Dataset Description

The VISEM dataset is a multimodal video dataset designed for the analysis of human spermatozoa. It is one of the few open datasets that combine multiple data sources, including videos, biological analysis data, and participant-related information. The dataset consists of anonymized data from 85 different participants, with a focus on improving research in human reproduction, particularly male… See the full description on the dataset page: https://huggingface.co/datasets/sperm-net/VISEM.
d
Transportation Projects in Your Neighborhood
catalog.data.gov
datasets.ai
+3more
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of New York (2025). Transportation Projects in Your Neighborhood [Dataset]. https://catalog.data.gov/dataset/transportation-projects-in-your-neighborhood
Explore at:
Dataset updated
Jul 19, 2025
Dataset provided by
State of New York
Description
This data set contains DOT construction project information. The data is refreshed nightly from multiple data sources, therefore the data becomes stale rather quickly.
Bitcoin BTC, 7 Exchanges, 1h Full Historical Data
kaggle.com
Updated Sep 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imran Bukhari (2025). Bitcoin BTC, 7 Exchanges, 1h Full Historical Data [Dataset]. https://www.kaggle.com/datasets/imranbukhari/comprehensive-btcusd-1h-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Imran Bukhari
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
I am a new developer and I would greatly appreciate your support. If you find this dataset helpful, please consider giving it an upvote!

Key Features:

Complete 1h Data: Raw 1h historical data from multiple exchanges, covering the entire trading history of BTCUSD available through their API endpoints. This dataset is updated daily to ensure up-to-date coverage.

Combined Index Dataset: A unique feature of this dataset is the combined index, which is derived by averaging all other datasets into one, please see attached notebook. This creates the longest continuous, unbroken BTCUSD dataset available on Kaggle, with no gaps and no erroneous values. It gives a much more comprehensive view of the market i.e. total volume across multiple exchanges.

Superior Performance: The combined index dataset has demonstrated superior 'mean average error' (MAE) metric performance when training machine learning models, compared to single-source datasets by a whole order of MAE magnitude.

Unbroken History: The combined dataset's continuous history is a valuable asset for researchers and traders who require accurate and uninterrupted time series data for modeling or back-testing.

https://i.imgur.com/OVOyF5A.png" alt="BTCUSD Dataset Summary">

https://i.imgur.com/6hxG2G3.png" alt="Combined Dataset Close Plot"> This plot illustrates the continuity of the dataset over time, with no gaps in data, making it ideal for time series analysis.

Included Resources:

Two Notebooks:

Dataset Usage and Diagnostics: This notebook demonstrates how to use the dataset and includes a powerful data diagnostics function, which is useful for all time series analyses.

Aggregating Multiple Data Sources: This notebook walks you through the process of combining multiple exchange datasets into a single, clean dataset. (Currently unavailable, will be added shortly)
Ethereum ETH, 7 Exchanges, 1h Full Historical Data
kaggle.com
zip
Updated Oct 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imran Bukhari (2025). Ethereum ETH, 7 Exchanges, 1h Full Historical Data [Dataset]. https://www.kaggle.com/datasets/imranbukhari/comprehensive-ethusd-1h-data/code
Explore at:
zip(16024314 bytes)Available download formats
Dataset updated
Oct 11, 2025
Authors
Imran Bukhari
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
I am a new developer and I would greatly appreciate your support. If you find this dataset helpful, please consider giving it an upvote!

Key Features:

Complete 1h Data: Raw 1h historical data from multiple exchanges, covering the entire trading history of ETHUSD available through their API endpoints. This dataset is updated daily to ensure up-to-date coverage.

Combined Index Dataset: A unique feature of this dataset is the combined index, which is derived by averaging all other datasets into one, please see attached notebook. This creates the longest continuous, unbroken ETHUSD dataset available on Kaggle, with no gaps and no erroneous values. It gives a much more comprehensive view of the market i.e. total volume across multiple exchanges.

Superior Performance: The combined index dataset has demonstrated superior 'mean average error' (MAE) metric performance when training machine learning models, compared to single-source datasets by a whole order of MAE magnitude.

Unbroken History: The combined dataset's continuous history is a valuable asset for researchers and traders who require accurate and uninterrupted time series data for modeling or back-testing.

https://i.imgur.com/1Qgdoqo.png" alt="ETHUSD Dataset Summary">

https://i.imgur.com/RDKMDjo.png" alt="Combined Dataset Close Plot"> This plot illustrates the continuity of the dataset over time, with no gaps in data, making it ideal for time series analysis.

Included Resources:

Two Notebooks:

Dataset Usage and Diagnostics: This notebook demonstrates how to use the dataset and includes a powerful data diagnostics function, which is useful for all time series analyses.

Aggregating Multiple Data Sources: This notebook walks you through the process of combining multiple exchange datasets into a single, clean dataset. (Currently unavailable, will be added shortly)
Data from: Mental Health United States 2010
catalog.data.gov
data.virginia.gov
+1more
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Substance Abuse and Mental Health Services Administration (2025). Mental Health United States 2010 [Dataset]. https://catalog.data.gov/dataset/mental-health-united-states-2010
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
Area covered
United States
Description
This publication provides behavioral health statistics at the national and state levels from multiple data sources, including the National Survey on Drug Use and Health, the National Health Interview Survey, the Medical Expenditures Panel Survey, the National Association of State Mental Health Program Directors, as well as peer-reviewed journal articles.
d
A+ Schools Report to the Community
datasets.ai
data.wprdc.org
+2more
33
Updated Jan 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allegheny County / City of Pittsburgh / Western PA Regional Data Center (2023). A+ Schools Report to the Community [Dataset]. https://datasets.ai/datasets/a-schools-report-to-the-community
Explore at:
33Available download formats
Dataset updated
Jan 24, 2023
Dataset authored and provided by
Allegheny County / City of Pittsburgh / Western PA Regional Data Center
Description
This report consolidates information from multiple data sources including PPS, PDE and Pittsburgh charter schools. Data is obtained through downloads from the web or through data requests. Raw data used to generate the reports will be made available as the files are processed.
w
Data from: ISLSCP II Global Population of the World
data.wu.ac.at
search.dataone.org
+6more
bin
Updated Apr 19, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2010). ISLSCP II Global Population of the World [Dataset]. https://data.wu.ac.at/schema/data_gov/OWZkOTYyZWMtMTI4MS00MmFmLTg3YjItOWFkODIzM2NkNjkz
Explore at:
binAvailable download formats
Dataset updated
Apr 19, 2010
Dataset provided by
National Aeronautics and Space Administration
Area covered
749861aaba2e7f5fd030670b24966767c772f104
Description
Global Population of the World (GPW) translates census population data to a latitude-longitude grid so that population data may be used in cross-disciplinary studies. There are three data files with this data set for the reference years 1990 and 1995. Over 127,000 administrative units and population counts were collected and integrated from various sources to create the gridded data. In brief, GPW was created using the following steps: * Population data were estimated for the product reference years, 1990 and 1995, either by the data source or by interpolating or extrapolating the given estimates for other years. * Additional population estimates were created by adjusting the source population data to match UN national population estimates for the reference years. * Borders and coastlines of the spatial data were matched to the Digital Chart of the World where appropriate and lakes from the Digital Chart of the World were added. * The resulting data were then transformed into grids of UN-adjusted and unadjusted population counts for the reference years. * Grids containing the area of administrative boundary data in each cell (net of lakes) were created and used with the count grids to produce population densities.As with any global data set based on multiple data sources, the spatial and attribute precision of GPW is variable. The level of detail and accuracy, both in time and space, vary among the countries for which data were obtained.
c
Data from: Compiled Database and Results of the Analysis of Multiple...
s.cnmilf.com
data.usgs.gov
+1more
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Compiled Database and Results of the Analysis of Multiple Groundwater-Quality Datasets for Idaho [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/compiled-database-and-results-of-the-analysis-of-multiple-groundwater-quality-datasets-for
Explore at:
Dataset updated
Oct 8, 2025
Dataset provided by
U.S. Geological Survey
Description
Groundwater is an important source of drinking and irrigation water throughout Idaho, and groundwater quality is monitored by various Federal, State, and local agencies. The historical, multi-agency records of groundwater quality include a valuable dataset that has yet to be compiled or analyzed on a statewide level. The purpose of this study is to combine groundwater-quality data from multiple sources into a single database, to summarize this dataset, and to perform bulk analyses to reveal spatial and temporal patterns of water quality throughout Idaho. Data were retrieved from the Water Quality Portal (www.waterqualitydata.us), the Idaho Department of Environmental Quality, and the Idaho Department of Water Resources. Analyses included counting the number of times a sample _location had concentrations above Maximum Contaminant Levels (MCL), performing trends tests, and calculating correlations between water-quality analytes.
2MASS Survey Merged Point Source Information Table
catalog.data.gov
Updated Sep 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/IPAC Infrared Science Archive (2025). 2MASS Survey Merged Point Source Information Table [Dataset]. https://catalog.data.gov/dataset/2mass-survey-merged-point-source-information-table
Explore at:
Dataset updated
Sep 19, 2025
Dataset provided by
NASA/IPAC Extragalactic Database
Description
The merged source tables contain the mean positions magnitudes and uncertainties for sources detected multiple times in each of the 2MASS data sets. The merging was carried out using an autocorrelation of the respective databases to identify groups of extractions that are positionally associated with each other, all lying within a 1.5" radius circular region. A number of confirmation statistics are also provided in the tables that can be used to test for source motion and/or variability, and the general quality of the merge.
International Comprehensive Ocean-Atmosphere Data Set (ICOADS)...
catalog.data.gov
ncei.noaa.gov
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOC/NOAA/NESDIS/NCEI > National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce (Point of Contact) (2023). International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Near-Real-Time (NRT) - Daily, Release 3.0.2 [Dataset]. https://catalog.data.gov/dataset/international-comprehensive-ocean-atmosphere-data-set-icoads-near-real-time-nrt-daily-release-3
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
United States Department of Commercehttp://commerce.gov/
National Environmental Satellite, Data, and Information Service
Description
The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) is the world's most extensive surface marine meteorological data collection. Building on national and international partnerships, ICOADS provides a variety of user communities with easy access to many different data sources in a consistent format. Data sources range from early historical ship observations to more modern, automated measurement systems including moored buoys and surface drifters. Past versions of the ICOADS dataset have been published as monthly files while holding a daily version of the product for internal use only. NCEI has since developed a reformatted daily product of the dataset that now aligns with the monthly, ready for public use. The objective of this initiative is to sustain the quality and usability of this high-profile ICOADS product for stakeholders that have requested the need for an expanded product. ICOADS R3.0.2 Daily is now developed and released.
EU MPA Paper submission Aminian Biquet et al
figshare.com
xlsx
Updated Aug 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juliette Aminian Biquet (2024). EU MPA Paper submission Aminian Biquet et al [Dataset]. http://doi.org/10.6084/m9.figshare.25103450.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25103450.v1
Dataset updated
Aug 16, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Juliette Aminian Biquet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Rdata and RMD file for the submission to One Earth by Aminian Biquet et al.See pdf file for a description of data files.To get 1) the entire dataset containing regulations at activity levels, identifiers of other databases, etc., and 2) the detailed description of raw data sources and protocol, look up for the publication (in prep. for Data in Brief): Regulations of activities and protection levels in Marine Protected Areas of the European Union gathered from multiple data sources. Aminian-Biquet et al. In prep.
H
U.S. Community Water Systems Service Boundaries, v1.0.0
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated May 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HydroShare (2022). U.S. Community Water Systems Service Boundaries, v1.0.0 [Dataset]. https://www.hydroshare.org/resource/5485b0f0278547068972ac7289547ab1
Explore at:
zip(202.0 MB)Available download formats
Dataset updated
May 18, 2022
Dataset provided by
HydroShare
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
This is a layer of water service boundaries for 44,919 community water systems that deliver tap water to 306.88 million people in the US. This amounts to 97.22% of the population reportedly served by active community water systems and 90.85% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. Tier 2b reflects overlapping boundaries for multiple systems. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a or Tier 2b), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

Several limitations to this data exist–and the layer should be used with these in mind. First, the case of assigning a Census Place TIGER polygon to multiple systems results in an inaccurate assignment of the same exact area to multiple systems; we hope to resolve Tier 2b systems into Tier 2a or Tier 3 in a future iteration. Second, matching algorithms to assign Census Place boundaries require additional validation and iteration. Third, Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Fourth, missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.

Facebook

Twitter

Click to copy link

Link copied

Cite

nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources

Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal

Explore at:

Dataset updated

Mar 31, 2025

Dataset provided by

NASAhttp://nasa.gov/

Description

Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.

Clear search

Close search

Google apps

Main menu

Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA...

Addresses (Open Data)

Replication Data for: Scaling Data from Multiple Sources

Data from: GALLO: An R package for Genomic Annotation and integration of...

Binance Coin BNB, 1m Full Historical Data

Key Features:

Included Resources:

Two Notebooks:

DataSheet2_Data Sources for Drug Utilization Research in Brazil—DUR-BRA...

Matched Sentinel-2 spectral data and chlorophyll a concentrations 2015-2020

Data from: Multimorbidity in Australia: Comparing estimates derived using...

Data from: VISEM

Transportation Projects in Your Neighborhood

Bitcoin BTC, 7 Exchanges, 1h Full Historical Data

Key Features:

Included Resources:

Two Notebooks:

Ethereum ETH, 7 Exchanges, 1h Full Historical Data

Key Features:

Included Resources:

Two Notebooks:

Data from: Mental Health United States 2010

A+ Schools Report to the Community

Data from: ISLSCP II Global Population of the World

Data from: Compiled Database and Results of the Analysis of Multiple...

2MASS Survey Merged Point Source Information Table

International Comprehensive Ocean-Atmosphere Data Set (ICOADS)...

EU MPA Paper submission Aminian Biquet et al

U.S. Community Water Systems Service Boundaries, v1.0.0

Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data PortalSee More Versions

Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal