We propose to measure the same single crystal sample of three compounds that emphasize different aspects of structure analysis of neutron SCD data:1. Natrolite (Na2 Al2 Si3 O10 × 2H2O, space group Fdd2, unit cell a = 18.2859 Å, b = 18.6117 Å, c = 6.5870 Å, a = 90 °, b = 90 °, g = 90 °) of the zeolite family is an alumo-silicate with porous 3D framework allowing framework distortion and resulting in negative thermal expansion. Due to this property it is industrially exploitable. The framework is rigid enough to be stable over a wide range of temperatures and pressures. The cavities are filled with one sodium and two waters.2. Lithium ammonium tartrate (LiNH4C4H406 × H20, space group P21212, a = 7.884 Å, b = 14.565 Å, c = 6.409 Å, a = 90 °, b = 90 °, g = 90 °) is a non centro-symmetric molecular material.3. Olivine, Li(Fe/Mn)PO4 synthesized to specific composition, a material of interest for energy material research and battery materials.Each of the chosen crystal will be measured on D19 and D9 to the highest reasonably attainable accuracy (residuals 1 to 3% expected) and d-spacing resolution.
This data product contains soil chemistry data from 4 locations. Two of the locations were located in the Neversink River watershed near Claryville, NY (01435000) in the Catskill Mountains of New York (Fall Brook and Winnisook Creek), 1 of the locations was the Young Woman’s Creek watershed near Renovo, PA (01545600) and the last site was the Wild River watershed at Gilead, Maine (01054200). Soil chemistry was collected at 2 times at each location: in 2001 and 2011 in Fall Brook, Young Woman’s Creek and Wild River and in 1993 and 2012 in Winnisook. This data product also contains water-quality data from 5 water-quality stations: West Branch Neversink River at Winnisook Lake [01434021], East Branch Neversink River northeast of Denning [0143400680], Rondout Creek above Red Brook at Peekamoose [01364959], Biscuit Brook above Pigeon Brook at Frost Valley [01434025], Neversink River at Claryville [01435000]. The data were collected from water year 1992 to 2014. Stream water discharge is included with each water quality sample. For 3 of the stations (01434021, 0143400680, 01364959) discharge was discontinued in 2012. For those stations, discharge from 2012 to 2014 was estimated using linear regression analyses of nearby or downstream stations. The results of those regression analyses are also included in this data product.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
The EuroSDR RPAS benchmark datasets were aqcuired in August 2021 as part of the EuroSDR benchmark initiative. This aims to evaluate the true geometric quality of real-world survey data generated from Remotely Piloted Aircraft System (RPAS) photogrammetry and lidar under different control configurations, focussing primarily on the geometric quality of data generated in the absence of ground control and local GNSS base station information.
Guided by a task force of National Mapping and Cadastral Agencies (NMCAs) experts and academics, in August 2021 Newcastle Geospatial Engineering team have established and surveyed a coordinated test field of independent checkpoints (CPs), test surfaces and profiles at the disused Wards Hill Quarry near Morpeth, Northumberland, UK. The 350 x 250 m study area was simultaneously surveyed using the following RPAS mounted instruments, each limited to a single flight to represent “real-world” operation:
More information can be found here.
To facilitate and encourage wider use of the EuroSDR RPAS dataset, it has been made open access.
About us
We are the Geospatial Engineering research group in the School of Engineering at Newcastle University with a long history of research and teaching across geospatial disciplines. EuroSDR is a not-for-profit organisation linking National Mapping and Cadastral Agencies with Research Institutes and Universities in Europe for the purpose of applied research in spatial data provision, management and delivery.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Checkout our blog post
Building an affordable and reliable benchmark for LLM chatbots has become a critical challenge. A high-quality benchmark should 1. robustly separate model capability 2. reflect human preference in real-world use cases 3. frequently update to avoid over-fitting or test set leakage
Traditional benchmarks are often static or close-ended (e.g., MMLU multi-choice QA), which do not satisfy the above requirements. On the other hand, models are evolving faster than ever, underscoring the need to build benchmarks with high separability.
We introduce Arena-Hard – a data pipeline to build high-quality benchmarks from live data in Chatbot Arena, which is a crowd-sourced platform for LLM evals.
We compare our new benchmark, Arena Hard v0.1, to a current leading chat LLM benchmark, MT Bench. We show Arena Hard v0.1 offers significantly stronger separability against MT Bench with tighter confidence intervals. It also has a higher agreement (89.1%, see blog post) with the human preference ranking by Chatbot Arena (english-only). We expect to see this benchmark useful for model developers to differentiate their model checkpoints.
Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.
Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.
Data Collection and Annotation
Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
• Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
• Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
• Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
• The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]
Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.
Dataset Description
• Total Samples: 61,734
• Total Samples Annotated: 57,222 after filtering.
• Total Posts: 3,646
• Average Likes per Post: 65.1
• Average Likes per Comment: 10.5
• Average Length of News Text: 655 words
• Average Number of Images per Post: 3.7
Components of the Dataset
The dataset comprises two main components:
• CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
• Images Folder: Contains folders with images corresponding to each post.
Data Format and Fields of the CSV File
The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
• Id: Unique identifier
• Post: The heading of the news article.
• News_text: The text of the news article.
• News_link: URL link to the original news article.
• News_Images: A path to the folder containing images related to the post.
• Post_shares: Number of times the post has been shared.
• Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
• Comment: Text of the user comment.
• Comment_like: Number of likes on the comment.
• Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
• Comment_link: URL link to the original comment on Facebook.
• Comment_rank: Rank of the comment based on engagement and relevance.
• Score: Sentiment score computed based on the consensus of sentiment analysis models.
• Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
• Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).
More Considerations During Dataset Construction
We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:
• Why not merge data from different social media platforms? We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.
• Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.
• Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.
• Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.
Ethical considerations, data privacy and misuse prevention
The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"
The Old Erie Canal has undergone sedimentation and aquatic growth that have restricted flow and diminished the aesthetic quality of the canal during the nearly 200 years since its construction. During 2018–2019, the U.S. Geological Survey (USGS) in cooperation with the Madison County Planning Department and the New York State Canal Corporation conducted a study of the Old Erie Canal between the Town of DeWitt, New York, and its junction with the current Erie Canal of the New York State Canal System near Rome, N.Y. The study comprised bathymetric, velocity, and water-quality surveys and documentation of the canal infrastructure. The USGS established benchmarks and staff gages along the 30.8 miles of the canal study area to reference the water-surface level in the canal to the North American Vertical Datum of 1988 (NAVD 88). Water-quality data (dissolved oxygen, water temperature, specific conductance, pH, and turbidity) were collected concurrently with the bathymetric survey (spring 2018) to characterize changes in water quality along the length of the canal. The canal infrastructure was documented to provide a baseline assessment. This dataset contains a shapefile representing locations where USGS benchmarks were installed along Old Erie Canal.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ChemLotUS contains water quality information in rivers and streams carefully curated and linked to a high resolution stream network. The dataset covers the contiguous United States (CONUS) with nearly 35 million records in about 290,000 sites. 11 water constituents are included so far: Calcium (Ca), Electrical Conductivity (EC), pH, Total Suspended Solids (TSS), Turbidity (Tu), Total Organic Carbon (TOC), Dissolved Oxygen (DO), Chlorophyll a (Chl-a), Nitrate (NO3), Soluble Reactive P (SRP), and Total P (TP). Raw water quality data comes from the Water Quality Portal (WQP - https://www.waterqualitydata.us/), and network information comes from the National Hydrography Dataset High Resolution (NHD-HR - https://www.usgs.gov/national-hydrography/nhdplus-high-resolution).
This repository contains the data, as well as the code developed to obtain it, according to the readme file.
More details about the dataset are explained in the research paper with the same name "ChemLotUS: A Benchmark Dataset of Lotic Chemistry across US River Networks", currently submitted to peer review
The Wine Quality data combines two benchmark data sets from UCI related to red and white wines.
Groundwater samples were collected from 60 public supply wells in the Colorado Plateaus principal aquifer. Water quality evaluations of groundwater for drinking water at public supply depths were made with the purpose of summarizing the current quality of source water (that is, untreated water) from public supply wells using two types of assessments; (1) status: an assessment that describes the current quality of the groundwater resource, and (2) understanding: an evaluation of the natural and human factors affecting the quality of groundwater, including an explanation of statistically significant associations between water quality and selected explanatory factors. To provide context for water-quality data, constituent concentrations of untreated groundwater are compared with available water-quality benchmarks Federal regulatory benchmarks for protecting human health (maximum contaminant levels [MCLs]; U.S. Environmental Protection Agency [USEPA] primary drinking water regulations; U.S. Environmental Protection Agency, 2018a) are used for this evaluation. Additionally, non-regulatory human-health benchmarks (health-based screening levels [HBSLs]; Norman and others, 2018; U.S. Geological Survey, 2018); and federal non-regulatory benchmarks for nuisance chemicals (USEPA secondary maximum contaminant levels [SMCLs]; U.S. Environmental Protection Agency, 2018b) are used. This report considers benchmarks in the context of health-based (MCLs and HBSLs) and non-health based (SMCLs) benchmarks. This sampling approach uses an equal-area grid design (Belitz and others, 2010) which allows for the estimation of the proportion of high, moderate, or low concentrations relative to federal water-quality benchmarks of selected constituents over the entire area of the aquifer. Tables included in this data release: Table 1. Identification, location, and construction information for wells sampled for the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. Table 2. Constituent primary uses and sources; analytical schedules and sampling period; USGS parameter codes; comparison thresholds and reporting levels wells sampled for the for the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. Table 3. Water-quality indicators in groundwater samples collected by the for the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; <, less than] Table 4. Nutrients and dissolved organic carbon in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level] Table 5. Major and minor ions in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; E, estimated] Table 6. Trace elements in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 7. Radionuclides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level] Table 8. Volatile organic compounds (VOCs) in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 9. Pesticides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level; E, estimated] Table 10. Quality control results for constituents analyzed for nutrients and dissolved organic carbon in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 11. Quality control results for constituents analyzed for majors in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 12. Quality control results for constituents analyzed for trace elements in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level] Table 13. Quality control results of a replicate analyzed for radionuclides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; NC, not collected] Table 14. Quality control results for constituents analyzed for VOCs in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; NC, not collected; E, estimated] Table 15. Quality control results for constituents analyzed for pesticides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; NC, not collected]
This data release contains groundwater-quality data and well information for the glacial aquifer system in the northern USA. Water-quality data and well information were derived from a dataset compiled from three sources: The U.S. Geological Survey (USGS) National Water Information System (NWIS; USGS, 1998, 2002), the U.S. Environmental Protection Agency (USEPA) Safe Drinking Water Information System (SDWIS; USEPA, 2013), and numerous agencies and organizations at the state, regional, and local level. The data compilation of the National Water Quality Program’s groundwater assessment team is an internal dataset informally referred to as the National Groundwater Aggregation (NGA). The current study of groundwater quality in the glaciated U.S. (Erickson and others, 2019) considers only parameters with benchmarks from wells in the national groundwater aggregation—data from springs were not used. Data were screened for sample dates of 2005 or later, and the most recent sample at each site was used. This data release includes a table of benchmarks and thresholds. “Benchmark” is a generic term for any standard, regulation, guideline, or criteria against which constituent concentrations are compared. The threshold is the value against which measured concentrations of constituents in water samples can be compared to help assess the potential effects of contaminants on water quality. The table of water-quality results includes the concentration of constituents relative to their health-based or non-health benchmark, and a flag to indicate if the concentration is low, medium, or high relative to the benchmark. A table of site information includes attributes for each well such as the source of the water-quality data and well information, the state, water use code, depth (if available), and the 17 hydrogeologic terrane from Yager and others (2018). Each hydrogeologic terrane contains Quaternary sediment that is derived from a common depositional history and can be characterized by similar texture and thickness. Each of the 17 hydrogeologic terranes was divided into 30 equal-areas (cells) based on the method of Scott (1990). This cell number for each well is included in the table of site information. An equal-area assessment was used to show the proportion of the aquifer affected by high, medium, and low concentrations of selected constituents at the aquifer scale and terrane scale (Belitz and others, 2010). The equal-area cells were also used with population data (Erickson and others, 2019, supplemental information) to determine aquifer- and terrane-scale proportions of the population affected by high, medium, and low concentrations of selected constituents. A shape file of the hydrogeologic terranes and equal-area cells is included in this data release. A table of well construction information includes attributes for each well such as the source of the well information, the state, well depth, screen length (if available), and the hydrogeologic terrane from Yager and others (2018). Information in this table is from a well construction database compiled from several sources to obtain information on well depths and screened intervals of domestic and public supply wells producing groundwater from Quaternary sediments in the U.S. within the glacial extent. Domestic-supply well data were compiled from a lithologic database (Bayless and others, 2017) as modified by Yager and others (2018), the USGS NWIS (USGS, 2016), and several state well log databases (Erickson and others, 2019, supplemental information). The state databases were accessed to add well records in areas where information from the lithologic and NWIS databases was sparse. Public-supply well data were compiled from the list of public water-supply wells in the water-use database of Yager and others (2018). This data release contains four tables and one shape file: Drinking_Water_QW_Glacial_Aquifer_System_Results.txt Drinking_Water_QW_Glacial_Aquifer_System_Sites.txt Well_Construction.txt Benchmarks.txt TerraneEqualAreas shape file
There are 3 data sets. Two zip files contain paired biological (benthic macroinvertebrate genera) (Data Biological.zip) and water quality data (Data Environmental.zip). These were used to estimate background specific conductivity from these state data and estimate the HC05 using the field based extirpation concentration method (USEPA 2011). The zipped files (Griffith ion MG20150729) contains two csv miles with ions summaries and ion and specific conductivity data from the combined EPA survey data. This dataset is associated with the following publication: Cormier, S., L. Zheng, R. Novak, and C. Flaherty. A flow-chart for developing water quality criteria from two field-based methods. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 633: 1647-1656, (2018).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The OpenBioLink2020 Dataset is a highly challenging biomedical benchmark dataset containing over 5 million positive and negative edges. The test set does not contain trivially predictable, inverse edges from the training set and does contain all different edge types, to provide a more realistic edge prediction scenario. For further information, please check out the github repository.
OpenBioLink2020: directed, high quality is the default dataset that should be used for benchmarking purposes. To allow anayzing the effect of data quality as well as the directionality of the evaluation graph, four variants of OpenBioLink2020 are provided -- in directed and undirected setting, with and without quality cutoff.
Additionally, each graph is available in RDF N3 format (without train-validation-test splits) and BEL.
OpenBioLink is a resource and evaluation framework for evaluating link prediction models on heterogeneous biomedical graph data. It contains benchmark datasets as well as tools for creating custom benchmarks and training and evaluating models.
The OpenBioLink benchmark aims to meet the following criteria:
Please note that the OpenBioLink benchmark files contain data derived from external ressources. Licensing terms of these external resources are detailed here.
The tabular data tables in this product include detection frequency and benchmark exceedances of agricultural pesticides in surface water samples collected in rivers and streams. The data are a product of a national-scale assessment of the toxicity of 221 pesticides at 74 sites during water years 2013 - 2017 conducted by the U.S. Geological Survey National Water Quality Pesticide Monitoring Program. Data were collected to 1) determine where and how often pesticides are detected in surface waters, and 2) identify regions in the continental United States that had streams with a higher risk of potential pesticide toxicity by comparing concentrations to both aquatic-life and human-health benchmarks.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Accurate data sets including noncovalent interactions have become essential for benchmarking computational methods. However, while there is much focus on obtaining an accurate description of relative energies, reliable prediction of accurate equilibrium geometries is also important. To facilitate the benchmarking of computed geometries, the current work includes an accurate data set of semiexperimental equilibrium geometries of noncovalent complexes that can be directly compared to ab initio data. The structures are based on high-accuracy spectroscopic data, combined with vibrational corrections at the double-hybrid density functional level. The current work is designed to complement available data sets of semiexperimental geometries of small rigid molecules and ab initio geometries of complexes. The benchmark-quality data comprises 16 complexes and includes dispersion interactions, hydrogen bonding, CH/π···π interactions, and trimers. In addition to the reference data, accurate counterpoise-corrected geometries have been obtained up to the CCSD level, along with interaction energies. A short overview of the performance of computational methods, including dispersion-corrected B3LYP and B2PLYP functionals, is also included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset for mapping muddy waters based on Sentinel-2 (L2A products) satellite imagery. The image data are saved as GeoTIFF files and metadata files are provided in json format. There are 19 images in total, based on 16 distinct European Areas of Interest (AOIs), covering a total of 9 countries such as: Greece Italy France Spain Belgium UK Sweden Finland and Serbia From the Sentinel-2 L2A products were extracted 10 spectral bands and then resampled to a 10m spatial resolution. All spectral bands used can be found in the Metadata/Source files. The annotated images comprise 3 classes, "Non-muddy", "Muddy" and "Ambiguous". More details about the annotation methodology can be found on the accepted abstract (file: Accepted_Abstract_03_15_2024.pdf) or the published paper, that you can find here: 10.1109/IGARSS53475.2024.10642051.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is part of the publication "Benchmarking Elasticity of FaaS Platforms as a Foundation for Objective-driven Design of Serverless Applications", it contains all plots and data used for the assessment of FaaS platform quality under volatile workloads from a client-side perspective. The paper is part of SAC'20, Brno, Czech Republic.
This dataset comprises over 4118 French trivia question-answer pairs, each accompanied by relevant Wikipedia context. We have manually verified a subset of the dataset for accuracy and data quality. Usage import datasets
queries = datasets.load_dataset("embedding-benchmark/DS1000", "queries") documents = datasets.load_dataset("embedding-benchmark/DS1000", "corpus") pair_labels = datasets.load_dataset("embedding-benchmark/DS1000", "default")
Since the late 1950s, the USGS has maintained a long-term glacier mass-balance program at three North American glaciers. Measurements began on South Cascade Glacier, WA in 1958, expanding to Gulkana and Wolverine glaciers, AK in 1966, and later Sperry Glacier, MT in 2005. Additional measurements have been made on Lemon Creek Glacier, AK to compliment data collected by the Juneau Icefield Research Program (JIRP; Pelto and others, 2013). Direct field measurements are combined with weather data and imagery analyses to estimate the seasonal and annual mass balance at each glacier in both a conventional and reference surface format (Cogley and others, 2011). High-altitude measurements of meteorological data have been collected since the beginning of the USGS Benchmark Glacier Program adjacent to glaciers in order to support related science. This portion of the data release includes select weather data that has received basic quality control and assurance. Data is released at three different levels of processing, level 0, 1 and 2. Level 0 data contains compiled raw data, before QC procedures are applied, at the original timestep recorded by the instrument. Level 1 data has received a plausible value check, and minimal manual error identification (e.g. errors noted on field visits). Level 2 data has been through more extensive quality control procedures and is provided at both the original instrument timestep as well as aggregated hourly and daily values. However, beyond the procedures detailed in this document, no additional steps have been taken to manually assure quality of the data. Data outside the main record of temperature and precipitation at each site should be considered preliminary, and be utilized with increased scrutiny.
https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario
Standards, guidelines and screening levels for assessing point of impingement concentrations of air contaminants
Get information on the ministry’s standards, guideline values and screening levels that are used to assess point of impingement concentrations of contaminants released into the air. These are used primarily to prepare an Emission Summary and Dispersion Modelling (ESDM) report.
This dataset is draft and in development.
We propose to measure the same single crystal sample of three compounds that emphasize different aspects of structure analysis of neutron SCD data:1. Natrolite (Na2 Al2 Si3 O10 × 2H2O, space group Fdd2, unit cell a = 18.2859 Å, b = 18.6117 Å, c = 6.5870 Å, a = 90 °, b = 90 °, g = 90 °) of the zeolite family is an alumo-silicate with porous 3D framework allowing framework distortion and resulting in negative thermal expansion. Due to this property it is industrially exploitable. The framework is rigid enough to be stable over a wide range of temperatures and pressures. The cavities are filled with one sodium and two waters.2. Lithium ammonium tartrate (LiNH4C4H406 × H20, space group P21212, a = 7.884 Å, b = 14.565 Å, c = 6.409 Å, a = 90 °, b = 90 °, g = 90 °) is a non centro-symmetric molecular material.3. Olivine, Li(Fe/Mn)PO4 synthesized to specific composition, a material of interest for energy material research and battery materials.Each of the chosen crystal will be measured on D19 and D9 to the highest reasonably attainable accuracy (residuals 1 to 3% expected) and d-spacing resolution.