100+ datasets found
  1. Dataset for Testing Contamination Source Identification Methods for Water...

    • catalog.data.gov
    • gimi9.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks [Dataset]. https://catalog.data.gov/dataset/dataset-for-testing-contamination-source-identification-methods-for-water-distribution-net
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This dataset includes the results of a simulation study using the source inversion techniques available in the Water Security Toolkit. The data was created to test the different techniques for accuracy, specificity, false positive rate, and false negative rate. The tests examined different parameters including measurement error, modeling error, injection characteristics, time horizon, network size, and sensor placement. The water distribution system network models that were used in the study are also included in the dataset. This dataset is associated with the following publication: Seth, A., K. Klise, J. Siirola, T. Haxton , and C. Laird. Testing Contamination Source Identification Methods for Water Distribution Networks. Journal of Environmental Division, Proceedings of American Society of Civil Engineers. American Society of Civil Engineers (ASCE), Reston, VA, USA, ., (2016).

  2. R

    Public Data(source Set, Omit Val And Test Set) Dataset

    • universe.roboflow.com
    zip
    Updated Sep 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TL Target Set Focus (2025). Public Data(source Set, Omit Val And Test Set) Dataset [Dataset]. https://universe.roboflow.com/tl-target-set-focus/no-close-up-public-data-source-set-omit-val-and-test-set-ozrdg/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 25, 2025
    Dataset authored and provided by
    TL Target Set Focus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    No Closeup Sangguo1 Bounding Boxes
    Description

    Public Data(source Set, Omit Val And Test Set)

    ## Overview
    
    Public Data(source Set, Omit Val And Test Set) is a dataset for object detection tasks - it contains No Closeup Sangguo1 annotations for 12,135 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. Supplemental Data and Source Code for Min-Max Test Research

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). Supplemental Data and Source Code for Min-Max Test Research [Dataset]. https://catalog.data.gov/dataset/supplemental-data-and-source-code-for-min-max-test-research
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The data, source code and scripts included in this dataset are used to generate the results presented in the manuscript "The min-max test: an objective method for discriminating mass spectra" by Moorthy and Sisco. The manuscript explores a new method for objectively discriminating electron ionization mass spectra, a task that is commonplace when compounds are closely eluting in gas chromatography mass spectrometry. The C++ source codes and R analysis scripts can be extended for other application areas.

  4. Z

    Data from: An empirical study of automatically-generated tests from the...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tassio Virginio (2020). An empirical study of automatically-generated tests from the perspective of test smells [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3953937
    Explore at:
    Dataset updated
    Jul 26, 2020
    Dataset provided by
    IFTO
    Authors
    Tassio Virginio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Developing software test code can be as or more expensive than developing software production code. Commonly, developers use automated unit test generators to speed up software testing. The purpose of such tools is to shorten production time without decreasing code quality. Nonetheless, unit tests usually do not have a quality check layer above testing code, which might be hard to guarantee the quality of the generated tests. An emerging strategy to verify the tests quality is to analyze the presence of test smells in software test code. Test smells are characteristics in the test code that possibly indicate weaknesses in test design and implementation. The presence of test smells in unit test code could be used as an indicator of unit test quality. In this paper, we present an empirical study aimed to analyze the quality of unit test code generated by automated test tools. We compare the tests generated by two tools (Randoop and EvoSuite) with the existing unit test suite of open-source software projects. We analyze the unit test code of twenty-one open-source Java projects and detected the presence of nineteen types of test smells. The results indicated significant differences in the unit test quality when comparing data from both automated unit test generators and existing unit test suites.

  5. Search-Based Test Data Generation for SQL Queries: Appendix

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen (2020). Search-Based Test Data Generation for SQL Queries: Appendix [Dataset]. http://doi.org/10.5281/zenodo.1166023
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeroen Castelein; Maurício Aniche; Maurício Aniche; Mozhan Soltani; Annibale Panichella; Arie van Deursen; Jeroen Castelein; Mozhan Soltani; Annibale Panichella; Arie van Deursen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The appendix of our ICSE 2018 paper "Search-Based Test Data Generation for SQL Queries: Appendix".

    The appendix contains:

    • The queries from the three open source systems we used in the evaluation of our tool (the industry software system is not part of this appendix, due to privacy reasons)
    • The results of our evaluation.
    • The source code of the tool. Most recent version can be found at https://github.com/SERG-Delft/evosql.
    • The results of the tuning procedure we conducted before running the final evaluation.
  6. Part 2 of real-time testing data for: "Identifying data sources and physical...

    • zenodo.org
    application/gzip
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). Part 2 of real-time testing data for: "Identifying data sources and physical strategies used by neural networks to predict TC rapid intensification" [Dataset]. http://doi.org/10.5281/zenodo.13272877
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each file in the dataset contains machine-learning-ready data for one unique tropical cyclone (TC) from the real-time testing dataset. "Machine-learning-ready" means that all data-processing methods described in the journal paper have already been applied. This includes cropping satellite images to make them TC-centered; rotating satellite images to align them with TC motion (TC motion is always towards the +x-direction, or in the direction of increasing column number); flipping satellite images in the southern hemisphere upside-down; and normalizing data via the two-step procedure.

    The file name gives you the unique identifier of the TC -- e.g., "learning_examples_2010AL01.nc.gz" contains data for storm 2010AL01, or the first North Atlantic storm of the 2010 season. Each file can be read with the method `example_io.read_file` in the ml4tc Python library (https://zenodo.org/doi/10.5281/zenodo.10268620). However, since `example_io.read_file` is a lightweight wrapper for `xarray.open_dataset`, you can equivalently just use `xarray.open_dataset`. Variables in the table are listed below (the same printout produced by `print(xarray_table)`):

    Dimensions: (
    satellite_valid_time_unix_sec: 289,
    satellite_grid_row: 380,
    satellite_grid_column: 540,
    satellite_predictor_name_gridded: 1,
    satellite_predictor_name_ungridded: 16,
    ships_valid_time_unix_sec: 19,
    ships_storm_object_index: 19,
    ships_forecast_hour: 23,
    ships_intensity_threshold_m_s01: 21,
    ships_lag_time_hours: 5,
    ships_predictor_name_lagged: 17,
    ships_predictor_name_forecast: 129)
    Coordinates:
    * satellite_grid_row (satellite_grid_row) int32 2kB ...
    * satellite_grid_column (satellite_grid_column) int32 2kB ...
    * satellite_valid_time_unix_sec (satellite_valid_time_unix_sec) int32 1kB ...
    * ships_lag_time_hours (ships_lag_time_hours) float64 40B ...
    * ships_intensity_threshold_m_s01 (ships_intensity_threshold_m_s01) float64 168B ...
    * ships_forecast_hour (ships_forecast_hour) int32 92B ...
    * satellite_predictor_name_gridded (satellite_predictor_name_gridded) object 8B ...
    * satellite_predictor_name_ungridded (satellite_predictor_name_ungridded) object 128B ...
    * ships_valid_time_unix_sec (ships_valid_time_unix_sec) int32 76B ...
    * ships_predictor_name_lagged (ships_predictor_name_lagged) object 136B ...
    * ships_predictor_name_forecast (ships_predictor_name_forecast) object 1kB ...
    Dimensions without coordinates: ships_storm_object_index
    Data variables:
    satellite_number (satellite_valid_time_unix_sec) int32 1kB ...
    satellite_band_number (satellite_valid_time_unix_sec) int32 1kB ...
    satellite_band_wavelength_micrometres (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_cyclone_id_string (satellite_valid_time_unix_sec) |S8 2kB ...
    satellite_storm_type_string (satellite_valid_time_unix_sec) |S2 578B ...
    satellite_storm_name (satellite_valid_time_unix_sec) |S10 3kB ...
    satellite_storm_latitude_deg_n (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_intensity_number (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_u_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_v_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_predictors_gridded (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column, satellite_predictor_name_gridded) float64 474MB ...
    satellite_grid_latitude_deg_n (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
    satellite_grid_longitude_deg_e (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
    satellite_predictors_ungridded (satellite_valid_time_unix_sec, satellite_predictor_name_ungridded) float64 37kB ...
    ships_storm_intensity_m_s01 (ships_valid_time_unix_sec) float64 152B ...
    ships_storm_type_enum (ships_storm_object_index, ships_forecast_hour) int32 2kB ...
    ships_forecast_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_forecast_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_v_wind_200mb_0to500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_vorticity_850mb_0to1000km_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_vortex_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_vortex_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_850mb_0to600km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_max_tangential_wind_850mb_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_1000mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_850mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_500mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_300mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_srh_1000to700mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_srh_1000to500mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_threshold_exceedance_num_6hour_periods (ships_storm_object_index, ships_intensity_threshold_m_s01) int32 2kB ...
    ships_v_motion_observed_m_s01 (ships_storm_object_index) float64 152B ...
    ships_v_motion_1000to100mb_flow_m_s01 (ships_storm_object_index) float64 152B ...
    ships_v_motion_optimal_flow_m_s01 (ships_storm_object_index) float64 152B ...
    ships_cyclone_id_string (ships_storm_object_index) object 152B ...
    ships_storm_latitude_deg_n (ships_storm_object_index) float64 152B ...
    ships_storm_longitude_deg_e (ships_storm_object_index) float64 152B ...
    ships_predictors_lagged (ships_valid_time_unix_sec, ships_lag_time_hours, ships_predictor_name_lagged) float64 13kB ...
    ships_predictors_forecast (ships_valid_time_unix_sec, ships_forecast_hour, ships_predictor_name_forecast) float64 451kB ...

    Variable names are meant to be as self-explanatory as possible. Potentially confusing ones are listed below.

    • The dimension ships_storm_object_index is redundant with the dimension ships_valid_time_unix_sec and can be ignored.
    • ships_forecast_hour ranges up to values that we do not actually use in the paper. Keep in mind that our max forecast hour used in machine learning is 24.
    • The dimension ships_intensity_threshold_m_s01 (and any variable including this dimension) can be ignored.
    • ships_lag_time_hours corresponds to lag times for the SHIPS satellite-based predictors. The only lag time we use in machine learning is "NaN", which is a stand-in for the best available of all lag times. See the discussion of the "priority list" in the paper for more details.
    • Most of the data variables can be ignored, unless you're doing a deep dive into storm properties. The important variables are satellite_predictors_gridded (full satellite images), ships_predictors_lagged (satellite-based SHIPS predictors), and ships_predictors_forecast (environmental and storm-history-based SHIPS predictors). These variables are all discussed in the paper.
    • Every variable name (including elements of the coordinate lists ships_predictor_name_lagged and ships_predictor_name_forecast) includes units at the end. For example, "m_s01" = metres per second; "deg_n" = degrees north; "deg_e" = degrees east; "j_kg01" = Joules per kilogram; ...; etc.
  7. d

    COVID-19 Daily Testing - By Person - Historical

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Jan 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2024). COVID-19 Daily Testing - By Person - Historical [Dataset]. https://catalog.data.gov/dataset/covid-19-daily-testing-by-person
    Explore at:
    Dataset updated
    Jan 12, 2024
    Dataset provided by
    data.cityofchicago.org
    Description

    This dataset is historical only and ends at 5/7/2021. For more information, please see http://dev.cityofchicago.org/open%20data/data%20portal/2021/05/04/covid-19-testing-by-person.html. The recommended alternative dataset for similar data beyond that date is https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test/gkdw-2tgv. This is the source data for some of the metrics available at https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html. For all datasets related to COVID-19, see https://data.cityofchicago.org/browse?limitTo=datasets&sortBy=alpha&tags=covid-19. This dataset contains counts of people tested for COVID-19 and their results. This dataset differs from https://data.cityofchicago.org/d/gkdw-2tgv in that each person is in this dataset only once, even if tested multiple times. In the other dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in that dataset more than once on the same day unless he/she had both a positive and not-positive test. Only Chicago residents are included based on the home address as provided by the medical provider. Molecular (PCR) and antigen tests are included, and only one test is counted for each individual. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table. Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results. Demographic data are more complete for those who test positive; care should be taken when calculating percentage positivity among demographic groups. All data are provisional and subject to change. Information is updated as additional details are received. Data Source: Illinois National Electronic Disease Surveillance System

  8. A/B Testing Data

    • kaggle.com
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanchi (2025). A/B Testing Data [Dataset]. https://www.kaggle.com/datasets/sanxhi/ab-testing-data-simulated-web-user-engagement
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    Kaggle
    Authors
    Sanchi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Simulated A/B Testing Data for Web User Engagement This dataset contains synthetically generated A/B testing data that mimics user behavior on a website with two versions: Control (con) and Experimental (exp). The dataset is designed for practicing data cleaning, statistical testing (e.g., Z-test, T-test), and pipeline development.

    Each row represents an individual user session, with attributes capturing click behavior, session duration, access device, referral source, and timestamp.

    Features: click — Binary (1 if clicked, 0 if not)

    group — A/B group assignment (con or exp, with injected label inconsistencies)

    session_time — Time spent in the session (in minutes), including outliers

    click_time — Timestamp of user interaction (nullable)

    device_type — Device used (mobile or desktop, mixed casing)

    referral_source — Where the user came from (e.g., social, email, with some typos/whitespace)

    Use Cases: A/B testing analysis (CTR, CVR)

    Hypothesis testing (Z-test, T-test)

    ETL pipeline design

    Data cleaning and standardization practice

    Dashboard creation and segmentation analysis

    Notes: The dataset includes intentional inconsistencies (nulls, duplicates, casing issues, typos) to reflect real-world challenges.

    Fully synthetic — safe for public use.

  9. i

    Insider Threat Test Dataset

    • impactcybertrust.org
    Updated Sep 18, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Data Source (2019). Insider Threat Test Dataset [Dataset]. http://doi.org/10.23721/100/1504339
    Explore at:
    Dataset updated
    Sep 18, 2019
    Authors
    External Data Source
    Description

    The CERT Division, in partnership with ExactData, LLC, and under sponsorship from DARPA I2O, generated a collection of synthetic insider threat test datasets. These datasets provide both synthetic background data and data from synthetic malicious actors. Datasets are organized according to the data generator release that created them. Most releases include multiple datasets (e.g., r3.1 and r3.2). Generally, later releases include a superset of the data generation functionality of earlier releases. Each dataset file contains a readme file that provides detailed notes about the features of that release. The answer key file answers.tar.bz2 contains the details of the malicious activity included in each dataset, including descriptions of the scenarios enacted and the identifiers of the synthetic users involved.

  10. COVID-19 testing

    • kaggle.com
    zip
    Updated Mar 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Habib Gültekin (2021). COVID-19 testing [Dataset]. https://www.kaggle.com/hgultekin/covid19-testing-rate-and-test-positivity
    Explore at:
    zip(159369 bytes)Available download formats
    Dataset updated
    Mar 21, 2021
    Authors
    Habib Gültekin
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Description

    These data files contain information about COVID-19 testing rate and test positivity, by country and by region. They are updated weekly.

    The figures are based on multiple data sources. The main source is data submitted by Member States to the European Surveillance System (TESSy). When not available, ECDC compiles data from public online sources. EU/EEA Member States report in TESSy all tests performed (i.e. both PCR and antigen tests).

    Disclaimer: The data compiled from public online sources have been automatically or manually retrieved (‘web-scraped’) on a daily basis. It should be noted that there are limitations to this type of data including that definitions vary and the data collection process requires constant adaptation to avoid to interrupted time series (i.e. due to modification of website pages, types of data).

    Publisher

    European Centre for Disease Prevention and Control

    Source

  11. Data sources

    • figshare.com
    docx
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yajie Zhang; zyj z (2024). Data sources [Dataset]. http://doi.org/10.6084/m9.figshare.26301130.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yajie Zhang; zyj z
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Na just test upload

  12. Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini (2023). Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project [Dataset]. http://doi.org/10.1371/journal.pone.0160648
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93–100%), while drug-based components were the main contributors in RLDs (81–100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.

  13. D

    Data from: Source code for the executable semantics presented in Master...

    • phys-techsciences.datastations.nl
    text/markdown, zip
    Updated Feb 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W.A.M. van Oort; G.J. Tretmans; W.A.M. van Oort; G.J. Tretmans (2021). Source code for the executable semantics presented in Master Thesis "Introducing Keyword-Driven Testing to System Level Testing Paradigms" [Dataset]. http://doi.org/10.17026/DANS-ZR6-U5Q2
    Explore at:
    zip(55941), zip(16373), text/markdown(2432)Available download formats
    Dataset updated
    Feb 16, 2021
    Dataset provided by
    DANS Data Station Physical and Technical Sciences
    Authors
    W.A.M. van Oort; G.J. Tretmans; W.A.M. van Oort; G.J. Tretmans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DESCRIPTIONThe code presented in this repository is a formal representation of the semantics of a system level test run in a keyword-driven testing framework, Robot Framework. The semantics are expressed in Haskell and can be executed using GHC(i).CONTENTS- RobotSemantics: The semantic definitions for a test run in Robot Framework- TypedArgumentsExtension: The semantics of an extension written for Robot Framework- Examples: Three examples, demonstrating the relation between the semantic definitions and actual Robot Framework test cases and keyword libraries.SHORT SUMMARYSystem level acceptance tests should cover a large amount of traces in a system under test, for which many testing paradigms exist. However, system level acceptance tests are required to be understood by many stakeholders, which is not always taken into consideration when a system level testing paradigm is designed. Combining two or more paradigms might yield a system level testing approach which can get the best of both worlds. To see whether this is the case for keyword-driven testing, we consider a triad of system level testing paradigms: Behavior-Driven Testing, Model-Based Testing and Test Data Generation. In the thesis we introduce a formal semantic definition of a keyword-driven testing framework, Robot Framework, to be able to reason about what the considered paradigms would entail for a case study at Canon Production Printing. For each of the three considered paradigms, a conclusion is drawn as to whether the paradigm could benefit from a combination with keyword-driven testing and what the relation between the paradigm and keyword-driven testing would be in such a combination.The executable semantics presented in this repository ought to provide an unambiguous starting point for reasoning about the keyword-driven testing paradigm. Moreover, the executable semantics are used to express semantic implications of additional concepts and extensions for keyword-driven testing with Robot Framework.

  14. f

    RE-AIM domains by measure and data source.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Langa, José C.; Sidat, Mohsin; Sacarlal, Jahit; Moon, Troy D. (2025). RE-AIM domains by measure and data source. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002043500
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Langa, José C.; Sidat, Mohsin; Sacarlal, Jahit; Moon, Troy D.
    Description

    Laboratory diagnosis for cryptococcal disease among HIV-infected patients remains a challenge in most low- and middle-income countries (LMIC). Difficulties with sustained access to cryptococcal rapid tests is cited as a major barrier to the routine screening for cryptococcus in many LMIC. Thus, clinicians in these countries often resort to empirical treatment based solely on clinical suspicion of cryptococcosis. To address this challenge, we aim to evaluate the re-introduction of India ink testing for diagnosis of cryptococcosis among HIV-infected patients in southern Mozambique. India ink testing was historically a common first choice, low-cost, laboratory diagnostic tool for cryptococcal infection. This study uses implementation science methods framed by the Dynamic Adaption Process (DAP) and the Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) conceptual frameworks to develop a multi-phase, stepped-wedged trial using mixed-methods approaches. The study will be conducted in six hospitals from southern Mozambique over a period of 15 months and will include the following phases: pre-implementation (baseline assessment), Adaptation-implementation (gradual introduction of the intervention), and post-implementation (post-intervention assessment). This study aims to promote the use of India Ink staining as a cheap and readily available tool for cryptococcosis diagnosis in southern Mozambique. Lessons learned in this study may be important to inform approaches to overcome the existing challenges in diagnosis of cryptococcosis in many LMICs due unavailability of readily diagnostic tools. Trial registration: ISRCTN11882960, Registered 06 August 2024.

  15. Replication Data for "Mapping the Structure and Evolution of Software...

    • zenodo.org
    bin, zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alireza Salahirad; Gregory Gay; Gregory Gay; Ehsan Mohammadi; Alireza Salahirad; Ehsan Mohammadi (2022). Replication Data for "Mapping the Structure and Evolution of Software Testing Research Over the Past Three Decades" [Dataset]. http://doi.org/10.5281/zenodo.7091926
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alireza Salahirad; Gregory Gay; Gregory Gay; Ehsan Mohammadi; Alireza Salahirad; Ehsan Mohammadi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this research (publication included in the package), we have used author-assigned keywords as a quantitative data source for understanding the connections between keywords and research topics in software testing research, based on a large sample of studies from Scopus.

    We apply co-word analysis to map the topology of testing research as a network where author-assigned keywords are connected by edges indicating co-occurrence in publications. Keywords are clustered based on edge density and frequency of connection. We examine the most popular keywords, summarize clusters into high-level research topics, examine how topics connect, and examine how the field is changing. This package contains the map and network files used to perform our analyses, as well as the publication sample.

  16. C

    COVID-19 Daily Testing - By Test - Historical

    • data.cityofchicago.org
    • healthdata.gov
    • +1more
    csv, xlsx, xml
    Updated May 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2024). COVID-19 Daily Testing - By Test - Historical [Dataset]. https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test-Historical/gkdw-2tgv
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    May 22, 2024
    Dataset authored and provided by
    City of Chicago
    Description

    NOTE: This dataset has been retired and marked as historical-only.

    This dataset contains counts of unique tests and results for COVID-19. This dataset differs from https://data.cityofchicago.org/d/t4hh-4ku9 in that each person is in that dataset only once, even if tested multiple times. In this dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in this dataset more than once on the same day unless he/she had both a positive and not-positive test.

    The positivity rate displayed in this dataset uses the method most commonly used by other jurisdictions in the United States.

    Only Chicago residents are included based on the home address as provided by the medical provider.

    Molecular (PCR) and antigen tests received through electronic lab reporting are included. Individuals may be tested multiple times. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table.

    Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results.

    All data are provisional and subject to change. Information is updated as additional details are received.

    Data Source: Illinois Department of Public Health Electronic Lab Reports

  17. Data from Laboratory Tests of a Prototype Carbon Dioxide Ground-Source Air...

    • data.nist.gov
    • datasets.ai
    • +1more
    Updated Oct 11, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harrison M. Skye (2019). Data from Laboratory Tests of a Prototype Carbon Dioxide Ground-Source Air Conditioner [Dataset]. http://doi.org/10.18434/M32142
    Explore at:
    Dataset updated
    Oct 11, 2019
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Harrison M. Skye
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    These data from the laboratory tests of a prototype residential liquid-to-air ground-source air conditioner (GSAC) using CO2 as the refrigerant. The data collection and processing methods are described in detail in this report: Report Title: "Laboratory Tests of a Prototype Carbon Dioxide Ground-Source Air Conditioner", NIST Technical Note 2068 Publication Date: October 2019 DOI: https://doi.org/10.6028/NIST.TN.2068 Authors: Harrison Skye, Wei Wu The tests were performed in an environmental chamber and followed the ISO 13256-1 standard for rating GSHPs. The CO2 GSAC operated either in a subcritical or a transcritical cycle, depending on the entering liquid temperature (ELT). The test results included the coefficient of performance (COP), capacity, sensible heat ratio (SHR), and pressures. The system incorporated a liquid-line/suction-line heat exchanger (LLSL-HX), which was estimated to cause a COP penalty of (0 to 2) % for ELTs ranging (10 to 25) °C, and benefit of (0 to 5) % for ELTs ranging (30 to 39) °C. With ELTs ranging (10 to 39) °C the CO2 system cooling COP ranged (7.3 to 2.4). At the standard rating condition (ELT 25 °C), the CO2 GSAC cooling COP was 4.14, and at part-load conditions (ELT 20 °C) the system had a COP of 4.92.

  18. G

    EGS Collab Experiment 1: Tracer data tests

    • gdr.openei.org
    • data.openei.org
    • +2more
    website
    Updated Apr 16, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang; Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang (2019). EGS Collab Experiment 1: Tracer data tests [Dataset]. http://doi.org/10.15121/1512084
    Explore at:
    websiteAvailable download formats
    Dataset updated
    Apr 16, 2019
    Dataset provided by
    Geothermal Data Repository
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Geothermal Technologies Program (EE-4G)
    Idaho National Laboratory
    Authors
    Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang; Ghanashyam Neupane; Earl Mattson; Adam Hawkins; Mitchell Plummer; Yuran Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the first set of tracer data for the EGS Collab testbed. The first set of tracer tests were conducted during October-November, 2018. We have included tracer data for C-dots, chloride, fluorescein, and rhodamine-B. The details about the tracer test can be found in Background and Methods of Tracer Tests (Mattson et al. (2019)) (also included in this package).

    References Mattson, E.D., Neupane, G., Plummer, M.A., Hawkins, A., Zhang, Y. and the EGS Collab Team 2019. Preliminary Collab fracture characterization results from flow and tracer testing efforts. In Proceedings 44th Workshop on Geothermal Reservoir Engineering, edited, Stanford University, Stanford, California.

  19. NYS State Test Dashboard and Source Data

    • kaggle.com
    zip
    Updated Dec 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khurshed Fazal (2022). NYS State Test Dashboard and Source Data [Dataset]. https://www.kaggle.com/datasets/khurshedfazal/2019nysdashboard
    Explore at:
    zip(81889768 bytes)Available download formats
    Dataset updated
    Dec 21, 2022
    Authors
    Khurshed Fazal
    Area covered
    New York
    Description

    To download and view the dashboard and data source, please click the download button on the upper right corner.

    A dataset with over half a million rows in the excel file was successfully cleaned and visualized into a fully functional Excel Dashboard with Slicers. Comparisons can be made between Charter Schools and NY State counterparts. Initial findings based on the visualizations show that there is a common trend in the data for females to outperform males in both ELA and Math across Public and Charter Schools. Charter School performance beats statewide performance in all student subgroups-- except for homeless students where the statewide percentage breakdown vs the charter school percentage breakdown shows us that the homeless students in charter schools are underperforming against their statewide counterparts. This raises the question on how should Charter Schools give these students more attention in terms of State Test Preparation.

  20. Data from: RTPTorrent: An Open-source Dataset for Evaluating Regression Test...

    • zenodo.org
    zip
    Updated Sep 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toni Mattis; Toni Mattis; Patrick Rein; Patrick Rein; Falco Dürsch; Robert Hirschfeld; Falco Dürsch; Robert Hirschfeld (2020). RTPTorrent: An Open-source Dataset for Evaluating Regression Test Prioritization [Dataset]. http://doi.org/10.5281/zenodo.3610999
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 23, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Toni Mattis; Toni Mattis; Patrick Rein; Patrick Rein; Falco Dürsch; Robert Hirschfeld; Falco Dürsch; Robert Hirschfeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is designed to be used in evluation studies of regression test prioritization techniques. It includes 20 open-source Java projects from GitHub and over 100,000 logs of real-world build logs from TravisCI. The projects span a wide range with regard to size, number of contributors, and maturity of open-source Java projects available on GitHub.

    Futher, the dataset includes the results of baseline approaches to ease the comparison of new techniques applied to the dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks [Dataset]. https://catalog.data.gov/dataset/dataset-for-testing-contamination-source-identification-methods-for-water-distribution-net
Organization logo

Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks

Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

This dataset includes the results of a simulation study using the source inversion techniques available in the Water Security Toolkit. The data was created to test the different techniques for accuracy, specificity, false positive rate, and false negative rate. The tests examined different parameters including measurement error, modeling error, injection characteristics, time horizon, network size, and sensor placement. The water distribution system network models that were used in the study are also included in the dataset. This dataset is associated with the following publication: Seth, A., K. Klise, J. Siirola, T. Haxton , and C. Laird. Testing Contamination Source Identification Methods for Water Distribution Networks. Journal of Environmental Division, Proceedings of American Society of Civil Engineers. American Society of Civil Engineers (ASCE), Reston, VA, USA, ., (2016).

Search
Clear search
Close search
Google apps
Main menu