100+ datasets found

f
Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of...
wiley.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonidas Bantis; Ziding Feng (2023). Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates [Dataset]. http://doi.org/10.6084/m9.figshare.6527219.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6527219.v1
Dataset updated
May 31, 2023
Dataset provided by
Wiley
Authors
Leonidas Bantis; Ziding Feng
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The receiver operating characteristics (ROC) curve is typically employed when one wants to evaluate the discriminatory capability of a continuous or ordinal biomarker in the case where two groups are to be distinguished, commonly the ’healthy’ and the ’diseased’. There are cases for which the disease status has three categories. Such cases employ the (ROC) surface, which is a natural generalization of the ROC curve for three classes. In this paper, we explore new methodologies for comparing two continuous biomarkers that refer to a trichotomous disease status, when both markers are applied to the same patients. Comparisons based on the volume under the surface have been proposed, but that measure is often not clinically relevant. Here, we focus on comparing two correlated ROC surfaces at given pairs of true classification rates, which are more relevant to patients and physicians. We propose delta-based parametric techniques, power transformations to normality, and bootstrap-based smooth nonparametric techniques to investigate the performance of an appropriate test. We evaluate our approaches through an extensive simulation study and apply them to a real data set from prostate cancer screening.
Z
Interpolated data on bioavailable strontium in the southern Trans-Urals,...
data.niaid.nih.gov
Updated Dec 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankushev, Maksim (2024). Interpolated data on bioavailable strontium in the southern Trans-Urals, 2020-2022 version 3.1 (current) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7370065
Explore at:
Dataset updated
Dec 1, 2024
Dataset provided by
Epimakhov, Andrey
Chechushkov, Igor
Kiseleva, Daria
Ankusheva, Polina
Ankushev, Maksim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ural Mountains
Description
Description

The Interpolated Strontium Values dataset Ver. 3.1 presents the interpolated data of strontium isotopes for the southern Trans-Urals, based on the data gathered in 2020-2022. The current dataset consists of five sets of files for five various interpolations: based on grass, mollusks, soil, and water samples, as well as the average of three (excluding the mollusk dataset). Each of the five sets consists of a CSV file and a KML file where the interpolated values are presented to use with a GIS software (ordinary kriging, 5000 m x 5000 m grid). In addition, two GeoTIFF files are provided for each set for a visual reference.

Average 5000 m interpolated points.kml / csv: these files contain averaged values of all three sample types.

Grass 5000 m interpolated points.kml / csv: these files contain data interpolated from the grass sample dataset.

Mollusks 5000 m interpolated points.kml / csv: these files contain data interpolated from the mollusk sample dataset.

Soil 5000 m interpolated points.kml / csv: these files contain data interpolated from the soil sample dataset.

Water 5000 m interpolated points.kml / csv: these files contain data interpolated from the water sample dataset.

The current version is also supplemented with GeoTiff raster files where the same interpolated values are color-coded. These files can be added to Google Earth or any GIS software together with KML files for better interpretation and comparison.

Averaged 5000 m interpolation raster.tif: this file contains a raster representing the averaged values of all three sample types.

Grass 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the grass sample dataset.

Mollusks 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the mollusk sample dataset.

Soil 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the soil sample dataset.

Water 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the water sample dataset

In addition, the cross-validation rasters created during the interpolation process are also provided. They can be used as a visual reference of the interpolation reliability. The grey areas on the raster represent the areas where expected values do not differ from interpolated values for more than 0.001. The red areas represent the areas where the error exceeded 0.001 and, thus, the interpolation is not reliable.

How to use it?

The data provided can be used to access interpolated background values of bioavailable strontium in the area of interest. Note that a single value is not a good enough predictor and should never be used as a proxy. Always calculate a mean of 4-6 (or more) nearby values to achieve the best guess possible. Never calculate averages from a single dataset, always rely on cross-validation by comparing data from all five datasets. Check the cross-validation rasters to make sure that the interpolation is reliable for the area of interest.

References

The interpolated datasets are based upon the actual measured values published as follows:

Epimakhov, Andrey; Kisileva, Daria; Chechushkov, Igor; Ankushev, Maksim; Ankusheva, Polina (2022): Strontium isotope ratios (87Sr/86Sr) analysis from various sources the southern Trans-Urals. PANGAEA, https://doi.pangaea.de/10.1594/PANGAEA.950380

Description of the original dataset of measured strontium isotopic values

The present dataset contains measurements of bioavailable strontium isotopes (87Sr/86Sr) gathered in the southern Trans-Urals. There are four sample types, such as wormwood (n = 103), leached soil (n = 103), water (n = 101), and freshwater mollusks (n = 80), collected to measure bioavailable strontium isotopes. The analysis of Sr isotopic composition was carried out in the cleanrooms (6 and 7 ISO classes) of the Geoanalitik shared research facilities of the Institute of Geology and Geochemistry, the Ural Branch of the Russian Academy of Sciences (Ekaterinburg). Mollusk shell samples preliminarily cleaned with acetic acid, as well as vegetation samples rinsed with deionized water and ashed, were dissolved by open digestion in concentrated HNO 3 with the addition of H 2 O 2 on a hotplate at 150°C. Water samples were acidified with concentrated nitric acid and filtered. To obtain aqueous leachates, pre-ground soil samples weighing 1 g were taken into polypropylene containers, 10 ml of ultrapure water was added and shaken in for 1 hour, after which they were filtered through membrane cellulose acetate filters with a pore diameter of 0.2 μm. In all samples, the strontium content was determined by ICP-MS (NexION 300S). Then the sample volume corresponding to the Sr content of 600 ng was evaporated on a hotplate at 120°C, and the precipitate was dissolved in 7M HNO 3. Sample solutions were centrifuged at 6000 rpm, and strontium was chromatographically isolated using SR resin (Triskem). The strontium isotopic composition was measured on a Neptune Plus multicollector mass spectrometer with inductively coupled plasma (MC-ICP-MS). To correct mass bias, a combination of bracketing and internal normalization according to the exponential law 88 Sr/ 86 Sr = 8.375209 was used. The results were additionally bracketed using the NIST SRM 987 strontium carbonate reference material using an average deviation from the reference value of 0.710245 for every two samples bracketed between NIST SRM 987 measurements. The long-term reproducibility of the strontium isotopic analysis was evaluated using repeated measurements of NIST SRM 987 during 2020-2022 and yielded 87 Sr/ 86 Sr = 0.71025, 2SD = 0.00012 (104 measurements in two replicates). The within-laboratory standard uncertainty (2σ) obtained for SRM-987 was ± 0.003 %.
f
A benchmark driven guide to binding site comparison: An exhaustive...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christiane Ehrt; Tobias Brinkjost; Oliver Koch (2023). A benchmark driven guide to binding site comparison: An exhaustive evaluation using tailor-made data sets (ProSPECCTs) [Dataset]. http://doi.org/10.1371/journal.pcbi.1006483
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006483
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS Computational Biology
Authors
Christiane Ehrt; Tobias Brinkjost; Oliver Koch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The automated comparison of protein-ligand binding sites provides useful insights into yet unexplored site similarities. Various stages of computational and chemical biology research can benefit from this knowledge. The search for putative off-targets and the establishment of polypharmacological effects by comparing binding sites led to promising results for numerous projects. Although many cavity comparison methods are available, a comprehensive analysis to guide the choice of a tool for a specific application is wanting. Moreover, the broad variety of binding site modeling approaches, comparison algorithms, and scoring metrics impedes this choice. Herein, we aim to elucidate strengths and weaknesses of binding site comparison methodologies. A detailed benchmark study is the only possibility to rationalize the selection of appropriate tools for different scenarios. Specific evaluation data sets were developed to shed light on multiple aspects of binding site comparison. An assembly of all applied benchmark sets (ProSPECCTs–Protein Site Pairs for the Evaluation of Cavity Comparison Tools) is made available for the evaluation and optimization of further and still emerging methods. The results indicate the importance of such analyses to facilitate the choice of a methodology that complies with the requirements of a specific scientific challenge.
Linked Open Data Management Services: A Comparison
zenodo.org
data.niaid.nih.gov
Updated Sep 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7738424
Dataset updated
Sep 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

ConedaKOR

LinkedDataHub

Metaphacts

Omeka S

ResearchSpace

Vitro

Wikibase

WissKI

The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

[1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

[2] Full paper will be made available open access in the conference proceedings.
d
Data from: Comparing three collection methods for pollinating insects within...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Comparing three collection methods for pollinating insects within electric transmission rights-of-ways [Dataset]. https://catalog.data.gov/dataset/data-from-comparing-three-collection-methods-for-pollinating-insects-within-electric-trans-7ef08
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
Insect pollinator community data collected from three types of insect traps/collecting methods (colored pan traps, blue vane traps, targeted sweep netting) from four power line right of ways in Alabama. Data are from one growing season (May-October 2018), and collection methods were employed once per month. Data include: 1) insect pollinator community composition data; 2) relative diversity calculations by insect Order; 3) overall insect pollinator community diversity summary by trap type/collecting method and month. These data reflect the community as sampled through different means in the same time period. Resources in this dataset: Resource title: Insect Pollinator Community Composition Matrix File name: Pollinator communty matrix.csv Resource description: Pollinator community composition (taxon, abundance) by site, insect trap type, and season. See Supplemental Table 1 in Campbell et al. 2023 for detailed taxa information. Resource title: Insect Pollinator Community Diversity by Order File name: Pollinator community diversity by Order.csv Resource description: Insect Pollinator community diversity metrics separated by Order for each site, for each insect trap type and season. Resource title: Summary of Overall Insect Pollinator Community Diversity File name: Overall Pollinator community diversity.csv Resource description: Overall Insect Pollinator community diversity summarized by trap type and season. Resource title: Dataset key File name: Dataset key table.pdf Resource description: Column titles and variable descriptions for three datasets, of: 1) Pollinator Community Composition; 2) Pollinator Community Diversity by Order; and 3) Overall Pollinator Community Diversity summarized by Trap Type and Season
Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories
catalog.data.gov
data.virginia.gov
+2more
Updated Jan 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories [Dataset]. https://catalog.data.gov/dataset/third-generation-simulation-data-tgsim-i-90-i-94-moving-trajectories
Explore at:
Dataset updated
Jan 24, 2025
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Area covered
Interstate 90
Description
The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day. As part of this dataset, the following files were provided: I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X. I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes. Annotation on Regions.zip, which includes images that visually map lanes (I90_9
r
Dataset for The effects of a number line intervention on calculation skills
researchdata.edu.au
figshare.mq.edu.au
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.V1
Explore at:
Unique identifier
https://doi.org/10.25949/22799717.V1
Dataset updated
May 18, 2023
Dataset provided by
Macquarie University
Authors
Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas
Description

Study information

The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.

All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.

The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.

The number of measurement points were distributed across participants as follows:

Participant 1 – 3 baseline, 6 treatment, 1 post-treatment

Participant 3 – 2 baseline, 7 treatment, 1 post-treatment

Participant 5 – 2 baseline, 5 treatment, 1 post-treatment

Participant 6 – 3 baseline, 4 treatment, 1 post-treatment

Participant 7 – 2 baseline, 5 treatment, 1 post-treatment

In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.

Measures

Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.

Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.

Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.

Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.

The Number Line Intervention

During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.

Variables in the dataset

Age = age in ‘years, months’ at the start of the study

Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)

Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.

The second part of the variable name refers to the task, as follows:

DC = dot comparison

SDC = single-digit computation

NLE_UT = number line estimation (untrained set)

NLE_T= number line estimation (trained set)

CE = multidigit computational estimation

NC = number comparison

The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).

Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.
COVID-19 Case Surveillance Public Use Data
data.cdc.gov
paperswithcode.com
+5more
application/rdfxml +5
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
Explore at:
application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

CDC has three COVID-19 case surveillance datasets:
COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements)
COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements)
COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (33 data elements)
The following apply to all three datasets:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data cells are suppressed to protect individual privacy.
The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

COVID-19 Case Reports

COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

For questions, please contact Ask SRRG (eocevent394@cdc.gov).

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
A multi-modal human neuroimaging dataset for data integration: simultaneous...
openneuro.org
Updated Jun 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giulia Lioi; Claire Cury; Lorraine Perronnet; Marsel Mano; Elise Bannier; Anatole Lecuyer; Christian Barillot (2021). A multi-modal human neuroimaging dataset for data integration: simultaneous EEG and fMRI acquisition during a motor imagery neurofeedback task: XP1 [Dataset]. http://doi.org/10.18112/openneuro.ds002336.v2.0.2
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002336.v2.0.2
Dataset updated
Jun 7, 2021
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Giulia Lioi; Claire Cury; Lorraine Perronnet; Marsel Mano; Elise Bannier; Anatole Lecuyer; Christian Barillot
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
———————————————————————————————— ORIGINAL PAPERS ————————————————————————————————

Lioi, G., Cury, C., Perronnet, L., Mano, M., Bannier, E., Lécuyer, A., & Barillot, C. (2019). Simultaneous MRI-EEG during a motor imagery neurofeedback task: an open access brain imaging dataset for multi-modal data integration Authors. BioRxiv. https://doi.org/https://doi.org/10.1101/862375

Mano, Marsel, Anatole Lécuyer, Elise Bannier, Lorraine Perronnet, Saman Noorzadeh, and Christian Barillot. 2017. “How to Build a Hybrid Neurofeedback Platform Combining EEG and FMRI.” Frontiers in Neuroscience 11 (140). https://doi.org/10.3389/fnins.2017.00140 Perronnet, Lorraine, L Anatole, Marsel Mano, Elise Bannier, Maureen Clerc, Christian Barillot, Lorraine Perronnet, et al. 2017. “Unimodal Versus Bimodal EEG-FMRI Neurofeedback of a Motor Imagery Task.” Frontiers in Human Neuroscience 11 (193). https://doi.org/10.3389/fnhum.2017.00193.

This dataset named XP1 can be pull together with the dataset XP2, available here : https://openneuro.org/datasets/ds002338. Data acquisition methods have been described in Perronnet et al. (2017, Frontiers in Human Neuroscience). Simultaneous 64 channels EEG and fMRI during right-hand motor imagery and neurofeedback (NF) were acquired in this study (as well as in XP2). For this study, 10 subjects performed three types of NF runs (bimodal EEG-fMRI NF, unimodal EEG-NF and fMRI-NF).

———————————————————————————————— EXPERIMENTAL PARADIGM ————————————————————————————————
Subjects were instructed to perform a kinaesthetic motor imagery of the right hand and to find their own strategy to control and bring the ball to the target. The experimental protocol consisted of 6 EEG-fMRI runs with a 20s block design alternating rest and task motor localizer run (task-motorloc) - 8 blocks X (20s rest+20 s task) motor imagery run without NF (task-MIpre) -5 blocks X (20s rest+20 s task) three NF runs with different NF conditions (task-eegNF, task-fmriNF, task-eegfmriNF) occurring in random order- 10 blocks X (20s rest+20 s task) motor imagery run without NF (task-MIpost) - 5 blocks X (20s rest+20 s task)

———————————————————————————————— EEG DATA ———————————————————————————————— EEG data was recorded using a 64-channel MR compatible solution from Brain Products (Brain Products GmbH, Gilching, Germany).

RAW EEG DATA

EEG was sampled at 5kHz with FCz as the reference electrode and AFz as the ground electrode, and a resolution of 0.5 microV. Following the BIDs arborescence, raw eeg data for each task can be found for each subject in

XP1/sub-xp1*/eeg

in Brain Vision Recorder format (File Version 1.0). Each raw EEG recording includes three files: the data file (.eeg), the header file (.vhdr) and the marker file (*.vmrk). The header file contains information about acquisition parameters and amplifier setup. For each electrode, the impedance at the beginning of the recording is also specified. For all subjects, channel 32 is the ECG channel. The 63 other channels are EEG channels.

The marker file contains the list of markers assigned to the EEG recordings and their properties (marker type, marker ID and position in data points). Three type of markers are relevant for the EEG processing: R128 (Response): is the fMRI volume marker to correct for the gradient artifact S 99 (Stimulus): is the protocol marker indicating the start of the Rest block S 2 (Stimulus): is the protocol marker indicating the start of the Task (Motor Execution Motor Imagery or Neurofeedback)
Warning : in few EEG data, the first S99 marker might be missing, but can be easily “added” 20 s before the first S 2.

PREPROCESSED EEG DATA

Following the BIDs arborescence, processed eeg data for each task and subject in the pre-processed data folder :

XP1/derivatives/sub-xp1*/eeg_pp/*eeg_pp.*

and following the Brain Analyzer format. Each processed EEG recording includes three files: the data file (.dat), the header file (.vhdr) and the marker file (*.vmrk), containing information similar to those described for raw data. In the header file of preprocessed data channels location are also specified. In the marker file the location in data points of the identified heart pulse (R marker) are specified as well.

EEG data were pre-processed using BrainVision Analyzer II Software, with the following steps: Automatic gradient artifact correction using the artifact template subtraction method (Sliding average calculation with 21 intervals for sliding average and all channels enabled for correction. Downsampling with factor: 25 (200 Hz) Low Pass FIR Filter:Cut-off Frequency: 50 Hz. Ballistocardiogram (pulse) artifact correction using a semiautomatic procedure (Pulse Template searched between 40 s and 240 s in the ECG channel with the following parameters:Coherence Trigger = 0.5, Minimal Amplitude = 0.5, Maximal Amplitude = 1.3. The identified pulses were marked with R. Segmentation relative to the first block marker (S 99) for all the length of the training protocol (las S 2 + 20 s).

EEG NF SCORES

Neurofeedback scores can be found in the .mat structures in

XP1/derivatives/sub-xp1*/NF_eeg/d_sub*NFeeg_scores.mat

Structures names NF_eeg are composed of the following subfields:

NF_eeg → .nf_laterality (NF score computed as for real-time calculation - equation (1))
→ .filteegpow_left (Bandpower of the filtered eeg signal in C1) → .filteegpow_right (Bandpower of the filtered eeg signal in C2) → .nf (vector of NF scores -4 per s- computed as in eq 3) for comparison with XP2 → .smoothed → .eegdata (64 X 200 X 400 matrix, with the pre-processed EEG signals according to the steps described above) → .method

Where the subfield method contains information about the laplacian filtered used and the frequency band of interest.

———————————————————————————————— BOLD fMRI DATA ———————————————————————————————— All DICOM files were converted to Nifti-1 and then in BIDs format (version 2.1.4) using the software dcm2niix (version v1.0.20190720 GVV7.4.0)

fMRI acquisitions were performed using echo- planar imaging (EPI) and covering the entire brain with the following parameters

3T Siemens Verio EPI sequence TR=2 s TE=23 ms Resolution 2x2x4 mm3 FOV = 210×210mm2 N of slices: 32 No slice gap

As specified in the relative task event files in XP1\ *events.tsv files onset, the scanner began the EPI pulse sequence two seconds prior to the start of the protocol (first rest block), so the the first two TRs should be discarded. The useful TRs for the runs are therefore

task-motorloc: 320 s (2 to 322) task-MIpre and task-MIpost: 200 s (2 to 202) task-eegNF, task-fmriNF, task-eegfmriNF: 400 s (2 to 402)

In task events files for the different tasks, each column represents:

'onset': onset time (sec) of an event

'duration': duration (sec) of the event

'trial_type': trial (block) type: rest or task (Rest, Task-ME, Task-MI, Task-NF)

''stim_file’: image presented in a stimulus block: during Rest, Motor Imagery (Task-MI) or Motor execution (Task-ME) instructions were presented. On the other hand, during Neurofeedback blocks (Task-NF) the image presented was a ball moving in a square that the subject could control self-regulating his EEG and/or fMRI brain activity.

Following the BIDs arborescence, the functional data and relative metadata are found for each subject in the following directory

XP1/sub-xp1*/func

BOLD-NF SCORES

For each subject and NF session, a matlab structure with BOLD-NF features can be found in

XP1/derivatives/sub-xp1*/NF_bold/

For each subject and NF session, a Matlab structure with BOLD-NF features can be found in

XP1/derivatives/sub-xp1*/NF_bold/

In view of BOLD-NF scores computation, fMRI data were preprocessed using SPM8 and with the following steps: slice-time correction, spatial realignment and coregistration with the anatomical scan, spatial smoothing with a 6 mm Gaussian kernel and normalization to the Montreal Neurological Institute (MNI) template. For each session, a first level general linear model analysis was then performed. The resulting activation maps (voxel-wise Family-Wise error corrected at p < 0.05) were used to define two ROIs (9x9x3 voxels) around the maximum of activation in the left and right motor cortex. The BOLD-NF scores (fMRI laterality index) were calculated as the difference between percentage signal change in the left and right motor ROIs as for the online NF calculation. A smoothed and normalized version of the NF scores over the precedent three volumes was also computed. To allow for comparison and aggregation of the two datasets XP1 and XP2 we also computed NF scores considering the left motor cortex and a background as for online NF calculation in XP2.

In the NF_bold folder, the Matlab files sub-xp1*_task-*_NFbold_scores.mat have therefore the following structure :

NF_bold → .nf_laterality (calculated as for online NF calculation) → .smoothnf_laterality → .normnf_laterality → .nf (calculated as for online NF calculation in XP2) → .roimean_left (averaged BOLD signal in the left motor ROI) → .roimean_right (averaged BOLD signal in the right motor ROI) → .bgmean (averaged BOLD signal in the background slice) → .method

Where the subfield ".method" contains information about the ROI size (.roisize), the background mask (.bgmask) and ROI masks (.roimask_left,.roimask_right ). More details about signal processing and NF calculation can be
d
Geodatabase of the datasets that represent the three subareas of the...
catalog.data.gov
data.usgs.gov
+4more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Geodatabase of the datasets that represent the three subareas of the Silurian-Devonian aquifer, Illinois, Indiana, Iowa, Kentucky, Michigan, Missouri, Ohio, Tennessee, and Wisconsin [Dataset]. https://catalog.data.gov/dataset/geodatabase-of-the-datasets-that-represent-the-three-subareas-of-the-silurian-devonian-aqu
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Ohio, Michigan, Missouri, Tennessee, Iowa, Wisconsin, Illinois, Kentucky
Description
This geodatabase includes spatial datasets that represent the Silurian-Devonian aquifers in the States of Illinois, Indiana, Iowa, Kentucky, Michigan, Missouri, Ohio, Tennessee, and Wisconsin. Included are: (1) polygon extents; datasets that represent the aquifer system extent, and the entire extent subdivided into subareas, (2) raster datasets for the altitude of the top and bottom surfaces of the entire aquifer (where data are available), and (3) altitude contours used to generate the surface rasters. The digitized contours are supplied for reference. The extent of the Silurian-Devonian aquifers is from the linework of the Silurian-Devonian aquifer extent maps in U.S. Geological Survey U.S. Geological Survey Hydrologic Atlas 730, Chapters J and K, (USGS HA 730-J, -K) and a digital version of the aquifer extent presented in the National Aquifer Code Reference List, available at http://water.usgs.gov/ogw/NatlAqCode-reflist.html , "silurian.zip". The extent was then modified for each subarea: Subarea 1 (sa1): Primarily in Ohio and Indiana, subject of U.S. Geological Survey Professional Paper 1423 B (USGS PP 1423B). Subarea 2 (sa2): In Iowa. Digital data were available from the Iowa Geologic Survey. Subarea 3 (sa3): Remaining area in Illinois, Wisconsin, Michigan, and Kentucky. Extent is that part of the National Aquifer Code Reference List polygon that remained when the areas of sa1 and sa2 were removed. The altitude and thickness contours that were available for each subarea were compiled or generated from georeferenced figures of altitude contours in USGS PP 1423B for sa1, digital data from IAGS for sa2. There were no vertical data for sa3. The resultant top and bottom altitude values were interpolated into surface rasters within a GIS using tools that create hydrologically correct surfaces from contour data, derive the altitude from the thickness (depth from the land surface), and merge the subareas into a single surface. The primary tool was an enhanced version of "Topo to Raster" used in ArcGIS, ArcMap, Esri 2014. The raster surfaces were corrected in the areas where the altitude of an underlying layer of the aquifer exceeded the altitude of an overlying layer.
Z
Datasets for Evaluation of Multimodal Image Registration
data.niaid.nih.gov
Updated Oct 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lindblad, Joakim (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4587902
Explore at:
Dataset updated
Oct 10, 2021
Dataset provided by
Sladoje, Nataša
Lindblad, Joakim
Lu, Jiahao
Öfverstedt, Johan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

Aerial data

The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.

Modality A: Near-Infrared (NIR)

Modality B: three colour channels (in B-G-R order)

Cytological data

The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.

Modality A: Fluorescence Images

Modality B: Quantitative Phase Images (QPI)

Histological dataset

For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.

Modality A: Second Harmonic Generation (SHG)

Modality B: Bright-Field (BF)

The evaluation set created from the above three publicly available datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, and 536 image pairs created from the histological dataset. Each image pair consists of a reference patch (I^{\text{Ref}}) and its corresponding initial transformed patch (I^{\text{Init}}) in both modalities, along with the ground-truth transformation parameters to recover it.

Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

Metadata

In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

Filename: identifier(ID) of the image pair

X1_Ref: x-coordinate of the upper-left corner of reference patch IRef

Y1_Ref: y-coordinate of the upper-left corner of reference patch IRef

X2_Ref: x-coordinate of the lower-left corner of reference patch IRef

Y2_Ref: y-coordinate of the lower-left corner of reference patch IRef

X3_Ref: x-coordinate of the lower-right corner of reference patch IRef

Y3_Ref: y-coordinate of the lower-right corner of reference patch IRef

X4_Ref: x-coordinate of the upper-right corner of reference patch IRef

Y4_Ref: y-coordinate of the upper-right corner of reference patch IRef

X1_Trans: x-coordinate of the upper-left corner of transformed patch IInit

Y1_Trans: y-coordinate of the upper-left corner of transformed patch IInit

X2_Trans: x-coordinate of the lower-left corner of transformed patch IInit

Y2_Trans: y-coordinate of the lower-left corner of transformed patch IInit

X3_Trans: x-coordinate of the lower-right corner of transformed patch IInit

Y3_Trans: y-coordinate of the lower-right corner of transformed patch IInit

X4_Trans: x-coordinate of the upper-right corner of transformed patch IInit

Y4_Trans: y-coordinate of the upper-right corner of transformed patch IInit

Displacement: mean Euclidean distance between reference corner points and transformed corner points

RelativeDisplacement: the ratio of displacement to the width/height of image patch

Tx: randomly generated translation in the x-direction to synthesise the transformed patch IInit

Ty: randomly generated translation in the y-direction to synthesise the transformed patch IInit

AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch IInit

AngleRad: randomly generated rotation in radian to synthesise the transformed patch IInit

Naming convention

Aerial Data

zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png

Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.

Cytological data

{{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png

Example: PNT1A_do_1_f15_02_01_T.png indicates the Transformed patch of the 2nd row and 1st column cut from the image with ID PNT1A_do_1_f15.

Histological data

{ID}_{ReferenceOrTransformed}.tif

Example: 1B_A4_T.tif indicates the Transformed patch cut from the image with ID 1B_A4.

This dataset was originally produced by the authors of Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study.
Third Generation Simulation Data (TGSIM) I-395 Trajectories
catalog.data.gov
data.virginia.gov
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Third Generation Simulation Data (TGSIM) I-395 Trajectories [Dataset]. https://catalog.data.gov/dataset/third-generation-simulation-data-tgsim-i-395-trajectories
Explore at:
Dataset updated
May 29, 2025
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Area covered
Interstate 395
Description
The main dataset is a 232 MB file of trajectory data (I395-final.csv) that contains position, speed, and acceleration data for non-automated passenger cars, trucks, buses, and automated vehicles on an expressway within an urban environment. Supporting files include an aerial reference image (I395_ref_image.png) and a list of polygon boundaries (I395_boundaries.csv) and associated images (I395_lane-1, I395_lane-2, …, I395_lane-6) stored in a folder titled “Annotation on Regions.zip” to map physical roadway segments to the numerical lane IDs referenced in the trajectory dataset. In the boundary file, columns “x1” to “x5” represent the horizontal pixel values in the reference image, with “x1” being the leftmost boundary line and “x5” being the rightmost boundary line, while the column "y" represents corresponding vertical pixel values. The origin point of the reference image is located at the top left corner. The dataset defines five lanes with five boundaries. Lane -6 corresponds to the area to the left of “x1”. Lane -5 corresponds to the area between “x1” and “x2”, and so forth to the rightmost lane, which is defined by the area to the right of “x5” (Lane -2). Lane -1 refers to vehicles that go onto the shoulder of the merging lane (Lane -2), which are manually separated by watching the videos. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which was one of the six collected as part of the TGSIM project, contains data collected from six 4K cameras mounted on tripods, positioned on three overpasses along I-395 in Washington, D.C. The cameras captured distinct segments of the highway, and their combined overlapping and non-overlapping footage resulted in a continuous trajectory for the entire section covering 0.5 km. This section covers a major weaving/mandatory lane-changing between L'Enfant Plaza and 4th Street SW, with three lanes in the eastbound direction and a major on-ramp on the left side. In addition to the on-ramp, the section covers an off-ramp on the right side. The expressway includes one diverging lane at the beginning of the section on the right side and one merging lane in the middle of the section on the left side. For the purposes of data extraction, the shoulder of the merging lane is also considered a travel lane since some vehicles illegally use it as an extended on-ramp to pass other drivers (see I395_ref_image.png for details). The cameras captured continuous footage during the morning rush hour (8:30 AM-10:30 AM ET) on a sunny day. During this period, vehicles equipped with SAE Level 2 automation were deployed to travel through the designated section to capture the impact of SAE Level 2-equipped vehicles on adjacent vehicles and their behavior in congested areas, particularly in complex merging sections. These vehicles are indicated in the dataset. As part of this dataset, the following files were provided: I395-final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle type, width, and length are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I395_ref_image.png is the aerial reference image that defines the geographic region and the associated roadway segments. I395_boundaries.csv contains the coordinates that define the roadway segments (n=X). The columns "x1" to "x5" represent the horizontal pi
a
Cat Annotation Dataset Merged
academictorrents.com
bittorrent
Updated Jul 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weiwei Zhang and Jian Sun and Xiaoou Tang (2014). Cat Annotation Dataset Merged [Dataset]. https://academictorrents.com/details/c501571c29d16d7f41d159d699d0e7fb37092cbd
Explore at:
bittorrent(1980831996)Available download formats
Dataset updated
Jul 2, 2014
Dataset authored and provided by
Weiwei Zhang and Jian Sun and Xiaoou Tang
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Cat Annotation Dataset The CAT dataset includes 10,000 cat images. For each image, we annotate the head of cat with nine points, two for eyes, one for mouth, and six for ears. The detail configuration of the annotation was shown in Figure 6 of the original paper: Weiwei Zhang, Jian Sun, and Xiaoou Tang, "Cat Head Detection - How to Effectively Exploit Shape and Texture Features", Proc. of European Conf. Computer Vision, vol. 4, pp.802-816, 2008. ### Format The annotation data are stored in a file with the name of the corresponding cat image plus ".cat", one annotation file for each cat image. For each annotation file, the annotation data are stored in the following sequence: 1. Number of points (always 9) 2. Left Eye 3. Right Eye 4. Mouth 5. Left Ear-1 6. Left Ear-2 7. Left Ear-3 8. Right Ear-1 9. Right Ear-2 10. Right Ear-3 ### Training, Validation, and Testing We randomly divide the data into three sets: 5,000 images for training, 2,000 images for valid
Human Right to Water Data Tool (CalHRTW 1.0)
data.ca.gov
calepa-dtsc.opendata.arcgis.com
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Office of Environmental Health Hazard Assessment (2024). Human Right to Water Data Tool (CalHRTW 1.0) [Dataset]. https://data.ca.gov/dataset/human-right-to-water-data-tool-calhrtw-1-0
Explore at:
arcgis geoservices rest api, htmlAvailable download formats
Dataset updated
Sep 20, 2024
Dataset authored and provided by
California Office of Environmental Health Hazard Assessmenthttp://www.oehha.ca.gov/
Description
OEHHA’s Human Right to Water Framework and Data Tool (CalHRTW 1.0) is comprised of this web-based data tool and an assessment report, Achieving the Human Right to Water in California: An Assessment of the State’s Community Water Systems. The purpose of CalHRTW 1.0 is to provide a comprehensive, stand-alone, quantitative assessment of the human right to water for three core components: water quality, water accessibility and water affordability. This data tool allows users to access and explore information in these three core areas interactively.

CalHRTW 1.0 measures and scores nine indicators across the three core components for each of the state’s 2,839 active community water systems (as of January 2019). Indicator scores within each of the three components are combined to create three individual composite component scores to illustrate a system’s overall status in providing clean, accessible and affordable water to its customers. Scores range from 0 to 4, with higher scores indicating worse outcomes. The data used is for 2011 to 2019.
Contact hr2.water@oehha.ca.gov for more information.
Sample Student Data
figshare.com
xls
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carrie Ellis (2022). Sample Student Data [Dataset]. http://doi.org/10.6084/m9.figshare.20419434.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20419434.v1
Dataset updated
Aug 2, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Carrie Ellis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In "Sample Student Data", there are 6 sheets. There are three sheets with sample datasets, one for each of the three different exercise protocols described (CrP Sample Dataset, Glycolytic Dataset, Oxidative Dataset). Additionally, there are three sheets with sample graphs created using one of the three datasets (CrP Sample Graph, Glycolytic Graph, Oxidative Graph). Each dataset and graph pairs are from different subjects. · CrP Sample Dataset and CrP Sample Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the creatine phosphate system. Here, the subject was a track and field athlete who threw the shot put for the DeSales University track team. The NIRS monitor was placed on the right triceps muscle, and the student threw the shot put six times with a minute rest in between throws. Data was collected telemetrically by the NIRS device and then downloaded after the student had completed the protocol. · Glycolytic Dataset and Glycolytic Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the glycolytic energy system. In this example, the subject performed continuous squat jumps for 30 seconds, followed by a 90 second rest period, for a total of three exercise bouts. The NIRS monitor was place on the left gastrocnemius muscle. Here again, data was collected telemetrically by the NIRS device and then downloaded after he had completed the protocol. · Oxidative Dataset and Oxidative Graph: In this example, the dataset and graph are from an exercise protocol designed to stress the oxidative system. Here, the student held a sustained, light-intensity, isometric biceps contraction (pushing against a table). The NIRS monitor was attached to the left biceps muscle belly. Here, data was collected by a student observing the SmO2 values displayed on a secondary device; specifically, a smartphone with the IPSensorMan APP displaying data. The recorder student observed and recorded the data on an Excel Spreadsheet, and marked the times that exercise began and ended on the Spreadsheet.
P
Three-view Synthetic data Dataset
paperswithcode.com
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristina P. Sinaga (2024). Three-view Synthetic data Dataset [Dataset]. https://paperswithcode.com/dataset/three-view-synthetic-data
Explore at:
Dataset updated
May 8, 2024
Authors
Kristina P. Sinaga
Description
10000 instances of three-view numerical data set with 4 clusters and 2 feature components are considered. The data points in each view are generated from a 2-component 2-variate Gaussian mixture model (GMM) where their mixing proportions $\alpha_1^{(1)}=\alpha_1^{(2)}=\alpha_1^{(3)}=\alpha_1^{(4)}=0.3$; $\alpha_2^{(1)}=\alpha_2^{(2)}=\alpha_2^{(3)}=\alpha_2^{(4)}=0.15$; $\alpha_3^{(1)}=\alpha_3^{(2)}=\alpha_3^{(3)}=\alpha_3^{(4)}=0.15$ and $\alpha_4^{(1)}=\alpha_4^{(2)}=\alpha_4^{(3)}=\alpha_4^{(4)}=0.4$. The means $\mu_{ik}^{(1)}$ for the first view are $[-10 ~-5)]$,$[-9 ~ 11]$, $[0~ 6]$ and $[4~0]$; The means $\mu_{ik}^{(2)}$ for the view 2 are $[-8 ~-12]$,$[-6 ~ -3]$, $[-2~ 7]$ and $[2~1]$; And the means $\mu_{ik}^{(3)}$ for the third view are $[-5 ~-10]$,$[-8 ~ -1]$, $[0~ 5]$ and $[5~-4]$. The covariance matrices for the three views are $\Sigma_1^{(1)}=\Sigma_1^{(2)}=\Sigma_1^{(3)}=\Sigma_1^{(4)}=\left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$; $\Sigma_2^{(1)}=\Sigma_2^{(2)}=\Sigma_2^{(3)}=\Sigma_2^{(4)}=3 \left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$; $\Sigma_3^{(1)}=\Sigma_3^{(2)}=\Sigma_3^{(3)}=\Sigma_3^{(4)}=2 \left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$; and $\Sigma_4^{(1)}=\Sigma_4^{(2)}=\Sigma_4^{(3)}=\Sigma_4^{(4)}=0.5 \left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$. These $x_1^{(1)}$ and $x_2^{(1)}$ are the coordinates for the view 1, $x_1^{(2)}$ and $x_2^{(2)}$ are the coordinates for the view 2, $x_1^{(3)}$ and $x_2^{(3)}$ are the coordinates for the view 3. While the original distribution of data points for cluster 1, cluster 2, cluster 3, and cluster 4 are 1514, 3046, 3903, and 1537, respectively.
n
EMG and data glove dataset for dexterous myoelectric control
data.ncl.ac.uk
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agamemnon Krasoulis; Sethu Vijayakumar; K. Nazarpour (2023). EMG and data glove dataset for dexterous myoelectric control [Dataset]. http://doi.org/10.25405/data.ncl.9577598.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25405/data.ncl.9577598.v1
Dataset updated
May 30, 2023
Dataset provided by
Newcastle University
Authors
Agamemnon Krasoulis; Sethu Vijayakumar; K. Nazarpour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
InstructionsAcquisition ProtocolThe 8th Ninapro database is described in the paper: "Agamemnon Krasoulis, Sethu Vijayakumar & Kianoush Nazarpour. Effect of user adaptation on prosthetic finger control with an intuitive myoelectric decoder, Frontiers in Neuroscience. Please cite this paper for any work related to this database.More information about the protocol can be found in the original paper: "Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto & Henning Müller. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Scientific Data, 2014" (http://www.nature.com/articles/sdata201453)The experiment comprised nine movements including single-finger as well as functional movements. The subjects had to repeat the instructed movements following visual cues (i.e. movies) shown on the screen of a computer monitor.The muscular activity was recorded using 16 active double-differential wireless sensors from a Delsys Trigno IM Wireless EMG system. The sensors comprise EMG electrodes and 9-axis inertial measurement units (IMUs). The sensors were positioned in two rows of eight units around the participants’ right forearm in correspondence to the radiohumeral joint (see pictures below). No specific muscles were targeted. The sensors were fixed on the forearm using the standard manufacturer-provided adhesive bands. Moreover, a hypoallergenic elastic latex-free band was placed around the sensors to keep them fixed during the acquisition. The sEMG signals were sampled at a rate of 1111 Hz, accelerometer and gyroscope data were sampled at 148 Hz, and magnetometer data were sampled at 74 Hz. All signals were upsampled to 2 kHz and post-synchronized.Hand kinematic data were recorded with a dataglove (Cyberglove 2, 18-DOF model). For all participants (i.e. both able-bodied and amputee), the data glove was worn on the left hand (i.e. contralateral to the arm where the EMG sensors were located). The Cyberglove signals correspond to data from the associated Cyberglove sensors located as shown in the picture below ("n/a" corresponds to sensors that were not available, since an 18-DOF model was used). Prior to each experimental session, the data glove was calibrated for the specific participant using the "quick calibration" procedure provided by the manufacturer. The Cyberglove signals were sampled at 100 Hz and subsequently upsampled to 2 kHz and synchronized to EMG and IMU data.Ten able-bodied (Subjects 1-10) and two right-hand transradial amputee participants (Subjects 11-12) are included in the dataset. During the acquisition, the subjects were asked to repeat 9 movements using both hands (bilateral mirrored movements). The duration of each of the nine movements varied between 6 and 9 seconds and consecutive trials were interleaved with 3 seconds of rest. Each repetition started with the participant holding their fingers at the rest state and involved slowly reaching the target posture as shown on the screen and returning to the rest state before the end of the trial. The following movements were included:0. rest1. thumb flexion/extension2. thumb abduction/adduction3. index finger flexion/extension4. middle finger flexion/extension5. combined ring and little fingers flexion/extension6. index pointer7. cylindrical grip8. lateral grip9. tripod grip DatasetsFor each participant, three datasets were collected: the first two datasets (acquisitions 1 & 2) comprised 10 repetitions of each movement and the third dataset (acquisition 3) comprised only two repetitions. For each subject, the associated .zip file contains three MATLAB files in .mat format, that is, one for each dataset, with synchronized variables.The variables included in the .mat files are the following:· subject: subject number· exercise: exercise number (value set to 1 in all data files)· emg (16 columns): sEMG signals from the 16 sensors· acc (48 columns): three-axis accelerometer data from the 16 sensors· gyro (48 columns): three-axis gyroscope data from the 16 sensors· mag (48 columns): three-axis magnetometer data from the 16 sensors· glove (18 columns): calibrated signals from the 18 sensors of the Cyberglove· stimulus (1 column): the movement repeated by the subject· restimulus (1 column): again the movement repeated by the subject. In this case, the duration of the movement label is refined a-posteriori in order to correspond to the real movement.· repetition (1 column): repetition number of the stimulus· rerepetition (1 column): repetition number of restimulus Important notesGiven the nature of the data collection procedure (slow finger movement and lack of extended hold period), this database is intended to be used for estimation/reconstruction of finger movement rather than motion/grip classification. In other words, the purpose of this database is to provide a benchmark for decoding finger position from (contralateral) EMG measurements using regression algorithms as opposed to classification. Therefore, the use of stimulus/restimulus vectors as target variables should be avoided; these are only provided for the user to have access to the exact timings of each movement repetition.Three datasets/acquisitions are provided for each subject. It is recommended that dataset 3, which comprises only two repetitions for each movement, is only used to report performance results and no training or hyper-parameter tuning is performed using this data (i.e. test dataset). The three datasets, which were recorded sequentially, can offer an out-of-the-box three-way split for model training (dataset 1), hyper-parameter tuning/validation (dataset 2), and performance testing (dataset 3). Another possibility is to merge datasets 1 & 2 and perform training and validation/hyper-parameter tuning using K-fold cross-validation, then report performance results on dataset 3.
Z
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
data.niaid.nih.gov
zenodo.org
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
Explore at:
Dataset updated
Mar 1, 2023
Dataset provided by
Schaer, Philipp
Haak, Fabian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
P
OVIC Datasets Dataset
paperswithcode.com
Updated Jul 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philipp Allgeuer; Kyra Ahrens; Stefan Wermter (2024). OVIC Datasets Dataset [Dataset]. https://paperswithcode.com/dataset/ovic-datasets
Explore at:
Dataset updated
Jul 14, 2024
Authors
Philipp Allgeuer; Kyra Ahrens; Stefan Wermter
Description
Due to the free-form nature of the open vocabulary image classification task, special annotations are required for image sets used for evaluation purposes. Three such image datasets are presented here:

World: 272 images of which the grand majority are originally sourced (have never been on the internet) from 10 countries by 12 people, with an active focus on covering as wide and varied concepts as possible, including unusual, deceptive and/or indirect representations of objects, Wiki: 1000 Wikipedia lead images sampled from a scraped pool of 18K, Val3K: 3000 images from the ImageNet-1K validation set, sampled uniformly across the classes.

It is not in general possible to exhaustively annotate ground truth classification labels for open vocabulary image sets, as this would require annotations for every possible correct object noun in the English language for every visible entity in every part of every image. It is possible however, to annotate the thousands of predictions that have been made across the image sets by open vocabulary models trained thus far. All three image datasets presented here have been individually annotated by both human and multimodal LLM annotators for the object nouns that were predicted by trained models. The annotations specify whether each classification is correct, close, or incorrect, and for the human annotations, whether it relates to a primary or secondary element of the image. It is customary to use the suffixes -H and -L to clearly specify which annotations are being referred to at any time, e.g. Wiki-H is the Wiki dataset with corresponding human annotations. All three datasets together contain a total of 17.4K human and 112K LLM class annotations.

The data is directly available at the following links:

World dataset Wiki dataset Val3K dataset

Refer to the NOVIC code for an example of how the datasets can be used, as well as tools for updating the class annotations for newer model predictions.
d
Data from: Apple flower detection using deep convolutional networks
catalog.data.gov
datasets.ai
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Apple flower detection using deep convolutional networks [Dataset]. https://catalog.data.gov/dataset/data-from-apple-flower-detection-using-deep-convolutional-networks-97e95
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
To optimize fruit production, a portion of the flowers and fruitlets of apple trees must be removed early in the growing season. The proportion to be removed is determined by the bloom intensity, i.e., the number of flowers present in the orchard. Several automated computer vision systems have been proposed to estimate bloom intensity, but their overall performance is still far from satisfactory even in relatively controlled environments. With the goal of devising a technique for flower identification which is robust to clutter and to changes in illumination, this paper presents a method in which a pre-trained convolutional neural network (CNN) is fine-tuned to become specially sensitive to flowers. Experimental results on a challenging dataset demonstrate that our method significantly outperforms three approaches that represent the state of the art in flower detection, with recall and precision rates higher than 90%. Moreover, a performance assessment on three additional datasets previously unseen by the network, which consist of different flower species and were acquired under different conditions, reveals that the proposed method highly surpasses baseline approaches in terms of generalization capability. This dataset comprises mp4 video sequences illustrating each combination of datasets and methods. Resources in this dataset:Resource Title: Supplementary data - Video mmc1 (7MB). File Name: 1-s2.0-S016636151730502X-mmc1.mp4Resource Description: Dataset = AppleA. Method on left-hand side: second baseline algorithm mentioned in the paper, where HSV is hue-saturation-value, and 'Bh' is Bhattacharyya distance. Method on right-hand side: our proposed method, the CNN + SVM, where CNN = convolutional neural network and SVM = support vector machine. True Positives (blue), False Positives (cyan), and False Negatives (red).Resource Title: Supplementary data - Video mmc2 (7MB). File Name: 1-s2.0-S016636151730502X-mmc2.mp4Resource Description: Dataset = AppleA. Method on left-hand side: third baseline algorithm mentioned in the paper, HSV + SVM, where HSV is hue-saturation-value and SVM is support vector machine. Method on right-hand side: our proposed method, the CNN + SVM, where CNN = convolutional neural network and SVM = support vector machine. True Positives (blue), False Positives (cyan), and False Negatives (red).Resource Title: Supplementary data - Video mmc3 (7MB). File Name: 1-s2.0-S016636151730502X-mmc3.mp4Resource Description: Dataset = AppleA. Method on left-hand side: first baseline algorithm mentioned in the paper, where HSV is hue-saturation-value. Method on right-hand side: our proposed method, the CNN + SVM, where CNN = convolutional neural network and SVM = support vector machine. True Positives (blue), False Positives (cyan), and False Negatives (red).Resource Title: Supplementary data - Video mmc4 (3MB). File Name: 1-s2.0-S016636151730502X-mmc4.mp4Resource Description: Dataset = AppleB. Method on left-hand side: third baseline algorithm mentioned in the paper, HSV + SVM, where HSV is hue-saturation-value and SVM is support vector machine. Method on right-hand side: our proposed method, the CNN + SVM, where CNN = convolutional neural network and SVM = support vector machine. True Positives (blue), False Positives (cyan), and False Negatives (red).Resource Title: Supplementary data - Video mmc5 (3MB). File Name: 1-s2.0-S016636151730502X-mmc5.mp4Resource Description: Dataset = AppleC. Method on left-hand side: third baseline algorithm mentioned in the paper, HSV + SVM, where HSV is hue-saturation-value and SVM is support vector machine. Method on right-hand side: our proposed method, the CNN + SVM, where CNN = convolutional neural network and SVM = support vector machine. True Positives (blue), False Positives (cyan), and False Negatives (red).Resource Title: Supplementary data - Video mmc6 (3MB). File Name: 1-s2.0-S016636151730502X-mmc6.mp4Resource Description: Dataset = Peach. Method on left-hand side: third baseline algorithm mentioned in the paper, HSV + SVM, where HSV is hue-saturation-value and SVM is support vector machine. Method on right-hand side: our proposed method, the CNN + SVM, where CNN = convolutional neural network and SVM = support vector machine. True Positives (blue), False Positives (cyan), and False Negatives (red).

Facebook

Twitter

Click to copy link

Link copied

Cite

Leonidas Bantis; Ziding Feng (2023). Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates [Dataset]. http://doi.org/10.6084/m9.figshare.6527219.v1

Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.6527219.v1

Dataset updated

May 31, 2023

Dataset provided by

Wiley

Authors

Leonidas Bantis; Ziding Feng

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The receiver operating characteristics (ROC) curve is typically employed when one wants to evaluate the discriminatory capability of a continuous or ordinal biomarker in the case where two groups are to be distinguished, commonly the ’healthy’ and the ’diseased’. There are cases for which the disease status has three categories. Such cases employ the (ROC) surface, which is a natural generalization of the ROC curve for three classes. In this paper, we explore new methodologies for comparing two continuous biomarkers that refer to a trichotomous disease status, when both markers are applied to the same patients. Comparisons based on the volume under the surface have been proposed, but that measure is often not clinically relevant. Here, we focus on comparing two correlated ROC surfaces at given pairs of true classification rates, which are more relevant to patients and physicians. We propose delta-based parametric techniques, power transformations to normality, and bootstrap-based smooth nonparametric techniques to investigate the performance of an appropriate test. We evaluate our approaches through an extensive simulation study and apply them to a real data set from prostate cancer screening.

Clear search

Close search

Google apps

Main menu

Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of...

Interpolated data on bioavailable strontium in the southern Trans-Urals,...

A benchmark driven guide to binding site comparison: An exhaustive...

Linked Open Data Management Services: A Comparison

Data from: Comparing three collection methods for pollinating insects within...

Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories

Dataset for The effects of a number line intervention on calculation skills

Study information

Measures

The Number Line Intervention

Variables in the dataset

COVID-19 Case Surveillance Public Use Data

CDC has three COVID-19 case surveillance datasets:

Overview

COVID-19 Case Reports

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

A multi-modal human neuroimaging dataset for data integration: simultaneous...

Geodatabase of the datasets that represent the three subareas of the...

Datasets for Evaluation of Multimodal Image Registration

Third Generation Simulation Data (TGSIM) I-395 Trajectories

Cat Annotation Dataset Merged

Human Right to Water Data Tool (CalHRTW 1.0)

Sample Student Data

Three-view Synthetic data Dataset

EMG and data glove dataset for dexterous myoelectric control

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

OVIC Datasets Dataset

Data from: Apple flower detection using deep convolutional networks

Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates