100+ datasets found

Dataset for Testing Contamination Source Identification Methods for Water...
catalog.data.gov
gimi9.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks [Dataset]. https://catalog.data.gov/dataset/dataset-for-testing-contamination-source-identification-methods-for-water-distribution-net
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This dataset includes the results of a simulation study using the source inversion techniques available in the Water Security Toolkit. The data was created to test the different techniques for accuracy, specificity, false positive rate, and false negative rate. The tests examined different parameters including measurement error, modeling error, injection characteristics, time horizon, network size, and sensor placement. The water distribution system network models that were used in the study are also included in the dataset. This dataset is associated with the following publication: Seth, A., K. Klise, J. Siirola, T. Haxton , and C. Laird. Testing Contamination Source Identification Methods for Water Distribution Networks. Journal of Environmental Division, Proceedings of American Society of Civil Engineers. American Society of Civil Engineers (ASCE), Reston, VA, USA, ., (2016).
R
Public Data(source Set, Omit Val And Test Set) Dataset
universe.roboflow.com
zip
Updated Sep 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TL Target Set Focus (2025). Public Data(source Set, Omit Val And Test Set) Dataset [Dataset]. https://universe.roboflow.com/tl-target-set-focus/no-close-up-public-data-source-set-omit-val-and-test-set-ozrdg/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 25, 2025
Dataset authored and provided by
TL Target Set Focus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
No Closeup Sangguo1 Bounding Boxes
Description
Public Data(source Set, Omit Val And Test Set)

## Overview Public Data(source Set, Omit Val And Test Set) is a dataset for object detection tasks - it contains No Closeup Sangguo1 annotations for 12,135 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Supplemental Data and Source Code for Min-Max Test Research
catalog.data.gov
s.cnmilf.com
+2more
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Supplemental Data and Source Code for Min-Max Test Research [Dataset]. https://catalog.data.gov/dataset/supplemental-data-and-source-code-for-min-max-test-research
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The data, source code and scripts included in this dataset are used to generate the results presented in the manuscript "The min-max test: an objective method for discriminating mass spectra" by Moorthy and Sisco. The manuscript explores a new method for objectively discriminating electron ionization mass spectra, a task that is commonplace when compounds are closely eluting in gas chromatography mass spectrometry. The C++ source codes and R analysis scripts can be extended for other application areas.
Z
Data from: An empirical study of automatically-generated tests from the...
data.niaid.nih.gov
zenodo.org
Updated Jul 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tassio Virginio (2020). An empirical study of automatically-generated tests from the perspective of test smells [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3953937
Explore at:
Dataset updated
Jul 26, 2020
Dataset provided by
IFTO
Authors
Tassio Virginio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Developing software test code can be as or more expensive than developing software production code. Commonly, developers use automated unit test generators to speed up software testing. The purpose of such tools is to shorten production time without decreasing code quality. Nonetheless, unit tests usually do not have a quality check layer above testing code, which might be hard to guarantee the quality of the generated tests. An emerging strategy to verify the tests quality is to analyze the presence of test smells in software test code. Test smells are characteristics in the test code that possibly indicate weaknesses in test design and implementation. The presence of test smells in unit test code could be used as an indicator of unit test quality. In this paper, we present an empirical study aimed to analyze the quality of unit test code generated by automated test tools. We compare the tests generated by two tools (Randoop and EvoSuite) with the existing unit test suite of open-source software projects. We analyze the unit test code of twenty-one open-source Java projects and detected the presence of nineteen types of test smells. The results indicated significant differences in the unit test quality when comparing data from both automated unit test generators and existing unit test suites.
Search-Based Test Data Generation for SQL Queries: Appendix
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen (2020). Search-Based Test Data Generation for SQL Queries: Appendix [Dataset]. http://doi.org/10.5281/zenodo.1166023
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1166023
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The appendix of our ICSE 2018 paper "Search-Based Test Data Generation for SQL Queries: Appendix".

The appendix contains:

The queries from the three open source systems we used in the evaluation of our tool (the industry software system is not part of this appendix, due to privacy reasons)

The results of our evaluation.

The source code of the tool. Most recent version can be found at https://github.com/SERG-Delft/evosql.

The results of the tuning procedure we conducted before running the final evaluation.
Part 2 of real-time testing data for: "Identifying data sources and physical...
zenodo.org
application/gzip
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). Part 2 of real-time testing data for: "Identifying data sources and physical strategies used by neural networks to predict TC rapid intensification" [Dataset]. http://doi.org/10.5281/zenodo.13272877
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13272877
Dataset updated
Aug 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each file in the dataset contains machine-learning-ready data for one unique tropical cyclone (TC) from the real-time testing dataset. "Machine-learning-ready" means that all data-processing methods described in the journal paper have already been applied. This includes cropping satellite images to make them TC-centered; rotating satellite images to align them with TC motion (TC motion is always towards the +x-direction, or in the direction of increasing column number); flipping satellite images in the southern hemisphere upside-down; and normalizing data via the two-step procedure.

The file name gives you the unique identifier of the TC -- e.g., "learning_examples_2010AL01.nc.gz" contains data for storm 2010AL01, or the first North Atlantic storm of the 2010 season. Each file can be read with the method `example_io.read_file` in the ml4tc Python library (https://zenodo.org/doi/10.5281/zenodo.10268620). However, since `example_io.read_file` is a lightweight wrapper for `xarray.open_dataset`, you can equivalently just use `xarray.open_dataset`. Variables in the table are listed below (the same printout produced by `print(xarray_table)`):

Dimensions: (
satellite_valid_time_unix_sec: 289,
satellite_grid_row: 380,
satellite_grid_column: 540,
satellite_predictor_name_gridded: 1,
satellite_predictor_name_ungridded: 16,
ships_valid_time_unix_sec: 19,
ships_storm_object_index: 19,
ships_forecast_hour: 23,
ships_intensity_threshold_m_s01: 21,
ships_lag_time_hours: 5,
ships_predictor_name_lagged: 17,
ships_predictor_name_forecast: 129)
Coordinates:
* satellite_grid_row (satellite_grid_row) int32 2kB ...
* satellite_grid_column (satellite_grid_column) int32 2kB ...
* satellite_valid_time_unix_sec (satellite_valid_time_unix_sec) int32 1kB ...
* ships_lag_time_hours (ships_lag_time_hours) float64 40B ...
* ships_intensity_threshold_m_s01 (ships_intensity_threshold_m_s01) float64 168B ...
* ships_forecast_hour (ships_forecast_hour) int32 92B ...
* satellite_predictor_name_gridded (satellite_predictor_name_gridded) object 8B ...
* satellite_predictor_name_ungridded (satellite_predictor_name_ungridded) object 128B ...
* ships_valid_time_unix_sec (ships_valid_time_unix_sec) int32 76B ...
* ships_predictor_name_lagged (ships_predictor_name_lagged) object 136B ...
* ships_predictor_name_forecast (ships_predictor_name_forecast) object 1kB ...
Dimensions without coordinates: ships_storm_object_index
Data variables:
satellite_number (satellite_valid_time_unix_sec) int32 1kB ...
satellite_band_number (satellite_valid_time_unix_sec) int32 1kB ...
satellite_band_wavelength_micrometres (satellite_valid_time_unix_sec) float64 2kB ...
satellite_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
satellite_cyclone_id_string (satellite_valid_time_unix_sec) |S8 2kB ...
satellite_storm_type_string (satellite_valid_time_unix_sec) |S2 578B ...
satellite_storm_name (satellite_valid_time_unix_sec) |S10 3kB ...
satellite_storm_latitude_deg_n (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_intensity_number (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_u_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_v_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
satellite_predictors_gridded (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column, satellite_predictor_name_gridded) float64 474MB ...
satellite_grid_latitude_deg_n (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
satellite_grid_longitude_deg_e (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
satellite_predictors_ungridded (satellite_valid_time_unix_sec, satellite_predictor_name_ungridded) float64 37kB ...
ships_storm_intensity_m_s01 (ships_valid_time_unix_sec) float64 152B ...
ships_storm_type_enum (ships_storm_object_index, ships_forecast_hour) int32 2kB ...
ships_forecast_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_forecast_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_v_wind_200mb_0to500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_vorticity_850mb_0to1000km_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_vortex_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_vortex_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_850mb_0to600km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_max_tangential_wind_850mb_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_1000mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_850mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_500mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_300mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_srh_1000to700mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_srh_1000to500mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_threshold_exceedance_num_6hour_periods (ships_storm_object_index, ships_intensity_threshold_m_s01) int32 2kB ...
ships_v_motion_observed_m_s01 (ships_storm_object_index) float64 152B ...
ships_v_motion_1000to100mb_flow_m_s01 (ships_storm_object_index) float64 152B ...
ships_v_motion_optimal_flow_m_s01 (ships_storm_object_index) float64 152B ...
ships_cyclone_id_string (ships_storm_object_index) object 152B ...
ships_storm_latitude_deg_n (ships_storm_object_index) float64 152B ...
ships_storm_longitude_deg_e (ships_storm_object_index) float64 152B ...
ships_predictors_lagged (ships_valid_time_unix_sec, ships_lag_time_hours, ships_predictor_name_lagged) float64 13kB ...
ships_predictors_forecast (ships_valid_time_unix_sec, ships_forecast_hour, ships_predictor_name_forecast) float64 451kB ...

Variable names are meant to be as self-explanatory as possible. Potentially confusing ones are listed below.

The dimension ships_storm_object_index is redundant with the dimension ships_valid_time_unix_sec and can be ignored.

ships_forecast_hour ranges up to values that we do not actually use in the paper. Keep in mind that our max forecast hour used in machine learning is 24.

The dimension ships_intensity_threshold_m_s01 (and any variable including this dimension) can be ignored.

ships_lag_time_hours corresponds to lag times for the SHIPS satellite-based predictors. The only lag time we use in machine learning is "NaN", which is a stand-in for the best available of all lag times. See the discussion of the "priority list" in the paper for more details.

Most of the data variables can be ignored, unless you're doing a deep dive into storm properties. The important variables are satellite_predictors_gridded (full satellite images), ships_predictors_lagged (satellite-based SHIPS predictors), and ships_predictors_forecast (environmental and storm-history-based SHIPS predictors). These variables are all discussed in the paper.

Every variable name (including elements of the coordinate lists ships_predictor_name_lagged and ships_predictor_name_forecast) includes units at the end. For example, "m_s01" = metres per second; "deg_n" = degrees north; "deg_e" = degrees east; "j_kg01" = Joules per kilogram; ...; etc.
d
COVID-19 Daily Testing - By Person - Historical
catalog.data.gov
healthdata.gov
+2more
Updated Jan 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2024). COVID-19 Daily Testing - By Person - Historical [Dataset]. https://catalog.data.gov/dataset/covid-19-daily-testing-by-person
Explore at:
Dataset updated
Jan 12, 2024
Dataset provided by
data.cityofchicago.org
Description
This dataset is historical only and ends at 5/7/2021. For more information, please see http://dev.cityofchicago.org/open%20data/data%20portal/2021/05/04/covid-19-testing-by-person.html. The recommended alternative dataset for similar data beyond that date is https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test/gkdw-2tgv. This is the source data for some of the metrics available at https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html. For all datasets related to COVID-19, see https://data.cityofchicago.org/browse?limitTo=datasets&sortBy=alpha&tags=covid-19. This dataset contains counts of people tested for COVID-19 and their results. This dataset differs from https://data.cityofchicago.org/d/gkdw-2tgv in that each person is in this dataset only once, even if tested multiple times. In the other dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in that dataset more than once on the same day unless he/she had both a positive and not-positive test. Only Chicago residents are included based on the home address as provided by the medical provider. Molecular (PCR) and antigen tests are included, and only one test is counted for each individual. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table. Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results. Demographic data are more complete for those who test positive; care should be taken when calculating percentage positivity among demographic groups. All data are provisional and subject to change. Information is updated as additional details are received. Data Source: Illinois National Electronic Disease Surveillance System
A/B Testing Data
kaggle.com
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanchi (2025). A/B Testing Data [Dataset]. https://www.kaggle.com/datasets/sanxhi/ab-testing-data-simulated-web-user-engagement
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 4, 2025
Dataset provided by
Kaggle
Authors
Sanchi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Simulated A/B Testing Data for Web User Engagement This dataset contains synthetically generated A/B testing data that mimics user behavior on a website with two versions: Control (con) and Experimental (exp). The dataset is designed for practicing data cleaning, statistical testing (e.g., Z-test, T-test), and pipeline development.

Each row represents an individual user session, with attributes capturing click behavior, session duration, access device, referral source, and timestamp.

Features: click — Binary (1 if clicked, 0 if not)

group — A/B group assignment (con or exp, with injected label inconsistencies)

session_time — Time spent in the session (in minutes), including outliers

click_time — Timestamp of user interaction (nullable)

device_type — Device used (mobile or desktop, mixed casing)

referral_source — Where the user came from (e.g., social, email, with some typos/whitespace)

Use Cases: A/B testing analysis (CTR, CVR)

Hypothesis testing (Z-test, T-test)

ETL pipeline design

Data cleaning and standardization practice

Dashboard creation and segmentation analysis

Notes: The dataset includes intentional inconsistencies (nulls, duplicates, casing issues, typos) to reflect real-world challenges.

Fully synthetic — safe for public use.
i
Insider Threat Test Dataset
impactcybertrust.org
Updated Sep 18, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2019). Insider Threat Test Dataset [Dataset]. http://doi.org/10.23721/100/1504339
Explore at:
Unique identifier
https://doi.org/10.23721/100/1504339
Dataset updated
Sep 18, 2019
Authors
External Data Source
Description
The CERT Division, in partnership with ExactData, LLC, and under sponsorship from DARPA I2O, generated a collection of synthetic insider threat test datasets. These datasets provide both synthetic background data and data from synthetic malicious actors. Datasets are organized according to the data generator release that created them. Most releases include multiple datasets (e.g., r3.1 and r3.2). Generally, later releases include a superset of the data generation functionality of earlier releases. Each dataset file contains a readme file that provides detailed notes about the features of that release. The answer key file answers.tar.bz2 contains the details of the malicious activity included in each dataset, including descriptions of the scenarios enacted and the identifiers of the synthetic users involved.
COVID-19 testing
kaggle.com
zip
Updated Mar 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Habib Gültekin (2021). COVID-19 testing [Dataset]. https://www.kaggle.com/hgultekin/covid19-testing-rate-and-test-positivity
Explore at:
zip(159369 bytes)Available download formats
Dataset updated
Mar 21, 2021
Authors
Habib Gültekin
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Description

These data files contain information about COVID-19 testing rate and test positivity, by country and by region. They are updated weekly.

The figures are based on multiple data sources. The main source is data submitted by Member States to the European Surveillance System (TESSy). When not available, ECDC compiles data from public online sources. EU/EEA Member States report in TESSy all tests performed (i.e. both PCR and antigen tests).

Disclaimer: The data compiled from public online sources have been automatically or manually retrieved (‘web-scraped’) on a daily basis. It should be noted that there are limitations to this type of data including that definitions vary and the data collection process requires constant adaptation to avoid to interrupted time series (i.e. due to modification of website pages, types of data).

Publisher

European Centre for Disease Prevention and Control

Source

https://data.europa.eu/euodp/en/data/dataset/covid-19-testing
Data sources
figshare.com
docx
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yajie Zhang; zyj z (2024). Data sources [Dataset]. http://doi.org/10.6084/m9.figshare.26301130.v2
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26301130.v2
Dataset updated
Jul 15, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yajie Zhang; zyj z
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Na just test upload
Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy...
plos.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini (2023). Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project [Dataset]. http://doi.org/10.1371/journal.pone.0160648
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0160648
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93–100%), while drug-based components were the main contributors in RLDs (81–100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.
D
Data from: Source code for the executable semantics presented in Master...
phys-techsciences.datastations.nl
text/markdown, zip
Updated Feb 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
W.A.M. van Oort; G.J. Tretmans; W.A.M. van Oort; G.J. Tretmans (2021). Source code for the executable semantics presented in Master Thesis "Introducing Keyword-Driven Testing to System Level Testing Paradigms" [Dataset]. http://doi.org/10.17026/DANS-ZR6-U5Q2
Explore at:
zip(55941), zip(16373), text/markdown(2432)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-ZR6-U5Q2
Dataset updated
Feb 16, 2021
Dataset provided by
DANS Data Station Physical and Technical Sciences
Authors
W.A.M. van Oort; G.J. Tretmans; W.A.M. van Oort; G.J. Tretmans
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DESCRIPTIONThe code presented in this repository is a formal representation of the semantics of a system level test run in a keyword-driven testing framework, Robot Framework. The semantics are expressed in Haskell and can be executed using GHC(i).CONTENTS- RobotSemantics: The semantic definitions for a test run in Robot Framework- TypedArgumentsExtension: The semantics of an extension written for Robot Framework- Examples: Three examples, demonstrating the relation between the semantic definitions and actual Robot Framework test cases and keyword libraries.SHORT SUMMARYSystem level acceptance tests should cover a large amount of traces in a system under test, for which many testing paradigms exist. However, system level acceptance tests are required to be understood by many stakeholders, which is not always taken into consideration when a system level testing paradigm is designed. Combining two or more paradigms might yield a system level testing approach which can get the best of both worlds. To see whether this is the case for keyword-driven testing, we consider a triad of system level testing paradigms: Behavior-Driven Testing, Model-Based Testing and Test Data Generation. In the thesis we introduce a formal semantic definition of a keyword-driven testing framework, Robot Framework, to be able to reason about what the considered paradigms would entail for a case study at Canon Production Printing. For each of the three considered paradigms, a conclusion is drawn as to whether the paradigm could benefit from a combination with keyword-driven testing and what the relation between the paradigm and keyword-driven testing would be in such a combination.The executable semantics presented in this repository ought to provide an unambiguous starting point for reasoning about the keyword-driven testing paradigm. Moreover, the executable semantics are used to express semantic implications of additional concepts and extensions for keyword-driven testing with Robot Framework.
f
RE-AIM domains by measure and data source.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Langa, José C.; Sidat, Mohsin; Sacarlal, Jahit; Moon, Troy D. (2025). RE-AIM domains by measure and data source. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002043500
Explore at:
Dataset updated
May 23, 2025
Authors
Langa, José C.; Sidat, Mohsin; Sacarlal, Jahit; Moon, Troy D.
Description
Laboratory diagnosis for cryptococcal disease among HIV-infected patients remains a challenge in most low- and middle-income countries (LMIC). Difficulties with sustained access to cryptococcal rapid tests is cited as a major barrier to the routine screening for cryptococcus in many LMIC. Thus, clinicians in these countries often resort to empirical treatment based solely on clinical suspicion of cryptococcosis. To address this challenge, we aim to evaluate the re-introduction of India ink testing for diagnosis of cryptococcosis among HIV-infected patients in southern Mozambique. India ink testing was historically a common first choice, low-cost, laboratory diagnostic tool for cryptococcal infection. This study uses implementation science methods framed by the Dynamic Adaption Process (DAP) and the Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) conceptual frameworks to develop a multi-phase, stepped-wedged trial using mixed-methods approaches. The study will be conducted in six hospitals from southern Mozambique over a period of 15 months and will include the following phases: pre-implementation (baseline assessment), Adaptation-implementation (gradual introduction of the intervention), and post-implementation (post-intervention assessment). This study aims to promote the use of India Ink staining as a cheap and readily available tool for cryptococcosis diagnosis in southern Mozambique. Lessons learned in this study may be important to inform approaches to overcome the existing challenges in diagnosis of cryptococcosis in many LMICs due unavailability of readily diagnostic tools. Trial registration: ISRCTN11882960, Registered 06 August 2024.
Replication Data for "Mapping the Structure and Evolution of Software...
zenodo.org
bin, zip
Updated Sep 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alireza Salahirad; Gregory Gay; Gregory Gay; Ehsan Mohammadi; Alireza Salahirad; Ehsan Mohammadi (2022). Replication Data for "Mapping the Structure and Evolution of Software Testing Research Over the Past Three Decades" [Dataset]. http://doi.org/10.5281/zenodo.7091926
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7091926
Dataset updated
Sep 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alireza Salahirad; Gregory Gay; Gregory Gay; Ehsan Mohammadi; Alireza Salahirad; Ehsan Mohammadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this research (publication included in the package), we have used author-assigned keywords as a quantitative data source for understanding the connections between keywords and research topics in software testing research, based on a large sample of studies from Scopus.

We apply co-word analysis to map the topology of testing research as a network where author-assigned keywords are connected by edges indicating co-occurrence in publications. Keywords are clustered based on edge density and frequency of connection. We examine the most popular keywords, summarize clusters into high-level research topics, examine how topics connect, and examine how the field is changing. This package contains the map and network files used to perform our analyses, as well as the publication sample.
C
COVID-19 Daily Testing - By Test - Historical
data.cityofchicago.org
healthdata.gov
+1more
csv, xlsx, xml
Updated May 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2024). COVID-19 Daily Testing - By Test - Historical [Dataset]. https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test-Historical/gkdw-2tgv
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
May 22, 2024
Dataset authored and provided by
City of Chicago
Description
NOTE: This dataset has been retired and marked as historical-only.

This dataset contains counts of unique tests and results for COVID-19. This dataset differs from https://data.cityofchicago.org/d/t4hh-4ku9 in that each person is in that dataset only once, even if tested multiple times. In this dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in this dataset more than once on the same day unless he/she had both a positive and not-positive test.

The positivity rate displayed in this dataset uses the method most commonly used by other jurisdictions in the United States.

Only Chicago residents are included based on the home address as provided by the medical provider.

Molecular (PCR) and antigen tests received through electronic lab reporting are included. Individuals may be tested multiple times. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table.

Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results.

All data are provisional and subject to change. Information is updated as additional details are received.

Data Source: Illinois Department of Public Health Electronic Lab Reports
Data from Laboratory Tests of a Prototype Carbon Dioxide Ground-Source Air...
data.nist.gov
datasets.ai
+1more
Updated Oct 11, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harrison M. Skye (2019). Data from Laboratory Tests of a Prototype Carbon Dioxide Ground-Source Air Conditioner [Dataset]. http://doi.org/10.18434/M32142
Explore at:
Unique identifier
https://doi.org/10.18434/M32142, https://identifiers.org/ark:/88434/mds2-2142
Dataset updated
Oct 11, 2019
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Authors
Harrison M. Skye
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
These data from the laboratory tests of a prototype residential liquid-to-air ground-source air conditioner (GSAC) using CO2 as the refrigerant. The data collection and processing methods are described in detail in this report: Report Title: "Laboratory Tests of a Prototype Carbon Dioxide Ground-Source Air Conditioner", NIST Technical Note 2068 Publication Date: October 2019 DOI: https://doi.org/10.6028/NIST.TN.2068 Authors: Harrison Skye, Wei Wu The tests were performed in an environmental chamber and followed the ISO 13256-1 standard for rating GSHPs. The CO2 GSAC operated either in a subcritical or a transcritical cycle, depending on the entering liquid temperature (ELT). The test results included the coefficient of performance (COP), capacity, sensible heat ratio (SHR), and pressures. The system incorporated a liquid-line/suction-line heat exchanger (LLSL-HX), which was estimated to cause a COP penalty of (0 to 2) % for ELTs ranging (10 to 25) °C, and benefit of (0 to 5) % for ELTs ranging (30 to 39) °C. With ELTs ranging (10 to 39) °C the CO2 system cooling COP ranged (7.3 to 2.4). At the standard rating condition (ELT 25 °C), the CO2 GSAC cooling COP was 4.14, and at part-load conditions (ELT 20 °C) the system had a COP of 4.92.
G
EGS Collab Experiment 1: Tracer data tests
gdr.openei.org
data.openei.org
+2more
website
Updated Apr 16, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang; Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang (2019). EGS Collab Experiment 1: Tracer data tests [Dataset]. http://doi.org/10.15121/1512084
Explore at:
websiteAvailable download formats
Unique identifier
https://doi.org/10.15121/1512084
Dataset updated
Apr 16, 2019
Dataset provided by
Geothermal Data Repository
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Geothermal Technologies Program (EE-4G)
Idaho National Laboratory
Authors
Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang; Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains the first set of tracer data for the EGS Collab testbed. The first set of tracer tests were conducted during October-November, 2018. We have included tracer data for C-dots, chloride, fluorescein, and rhodamine-B. The details about the tracer test can be found in Background and Methods of Tracer Tests (Mattson et al. (2019)) (also included in this package).

References Mattson, E.D., Neupane, G., Plummer, M.A., Hawkins, A., Zhang, Y. and the EGS Collab Team 2019. Preliminary Collab fracture characterization results from flow and tracer testing efforts. In Proceedings 44th Workshop on Geothermal Reservoir Engineering, edited, Stanford University, Stanford, California.
NYS State Test Dashboard and Source Data
kaggle.com
zip
Updated Dec 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khurshed Fazal (2022). NYS State Test Dashboard and Source Data [Dataset]. https://www.kaggle.com/datasets/khurshedfazal/2019nysdashboard
Explore at:
zip(81889768 bytes)Available download formats
Dataset updated
Dec 21, 2022
Authors
Khurshed Fazal
Area covered
New York
Description
To download and view the dashboard and data source, please click the download button on the upper right corner.

A dataset with over half a million rows in the excel file was successfully cleaned and visualized into a fully functional Excel Dashboard with Slicers. Comparisons can be made between Charter Schools and NY State counterparts. Initial findings based on the visualizations show that there is a common trend in the data for females to outperform males in both ELA and Math across Public and Charter Schools. Charter School performance beats statewide performance in all student subgroups-- except for homeless students where the statewide percentage breakdown vs the charter school percentage breakdown shows us that the homeless students in charter schools are underperforming against their statewide counterparts. This raises the question on how should Charter Schools give these students more attention in terms of State Test Preparation.
Data from: RTPTorrent: An Open-source Dataset for Evaluating Regression Test...
zenodo.org
zip
Updated Sep 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toni Mattis; Toni Mattis; Patrick Rein; Patrick Rein; Falco Dürsch; Robert Hirschfeld; Falco Dürsch; Robert Hirschfeld (2020). RTPTorrent: An Open-source Dataset for Evaluating Regression Test Prioritization [Dataset]. http://doi.org/10.5281/zenodo.3610999
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3610999
Dataset updated
Sep 23, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Toni Mattis; Toni Mattis; Patrick Rein; Patrick Rein; Falco Dürsch; Robert Hirschfeld; Falco Dürsch; Robert Hirschfeld
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is designed to be used in evluation studies of regression test prioritization techniques. It includes 20 open-source Java projects from GitHub and over 100,000 logs of real-world build logs from TravisCI. The projects span a wide range with regard to size, number of contributors, and maturity of open-source Java projects available on GitHub.

Futher, the dataset includes the results of baseline approaches to ease the comparison of new techniques applied to the dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks [Dataset]. https://catalog.data.gov/dataset/dataset-for-testing-contamination-source-identification-methods-for-water-distribution-net

Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks

Explore at:

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

This dataset includes the results of a simulation study using the source inversion techniques available in the Water Security Toolkit. The data was created to test the different techniques for accuracy, specificity, false positive rate, and false negative rate. The tests examined different parameters including measurement error, modeling error, injection characteristics, time horizon, network size, and sensor placement. The water distribution system network models that were used in the study are also included in the dataset. This dataset is associated with the following publication: Seth, A., K. Klise, J. Siirola, T. Haxton , and C. Laird. Testing Contamination Source Identification Methods for Water Distribution Networks. Journal of Environmental Division, Proceedings of American Society of Civil Engineers. American Society of Civil Engineers (ASCE), Reston, VA, USA, ., (2016).

Clear search

Close search

Google apps

Main menu

Dataset for Testing Contamination Source Identification Methods for Water...

Public Data(source Set, Omit Val And Test Set) Dataset

Public Data(source Set, Omit Val And Test Set)

Supplemental Data and Source Code for Min-Max Test Research

Data from: An empirical study of automatically-generated tests from the...

Search-Based Test Data Generation for SQL Queries: Appendix

Part 2 of real-time testing data for: "Identifying data sources and physical...

COVID-19 Daily Testing - By Person - Historical

A/B Testing Data

Insider Threat Test Dataset

COVID-19 testing

Description

Publisher

Source

Data sources

Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy...

Data from: Source code for the executable semantics presented in Master...

RE-AIM domains by measure and data source.

Replication Data for "Mapping the Structure and Evolution of Software...

COVID-19 Daily Testing - By Test - Historical

Data from Laboratory Tests of a Prototype Carbon Dioxide Ground-Source Air...

EGS Collab Experiment 1: Tracer data tests

NYS State Test Dashboard and Source Data

Data from: RTPTorrent: An Open-source Dataset for Evaluating Regression Test...

Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks