MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Otter DUDe Dataset Card
Otter DUDe includes 1,452,568 instances of drug-target interactions.
Dataset details
DUDe
DUDe comprises a collection of 22,886 active compounds and their corresponding affinities towards 102 targets. For our study, we utilized a preprocessed version of the DUDe, which includes 1,452,568 instances of drug-target interactions. To prevent any data leakage, we eliminated the negative interactions and the overlapping triples with the TDC DTI… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/otter_dude.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data to reproduce results from the paper "Magnifying Side-Channel Leakage of Lattice-Based Cryptosystems with Chosen Ciphertexts: The Case Study of Kyber."
Abstract
In this paper, we propose EM side-channel attacks with carefully constructed ciphertext on Kyber, a lattice-based key encapsulation mechanism, which is a candidate of NIST Post-Quantum Cryptography standardization project. We demonstrate that specially chosen ciphertexts allow an adversary to modulate the leakage of a target device and enable full key extraction with a small number of traces through simple power analysis. Compared to prior research, our techniques require a lower number of traces and avoid the need for template attacks. We practically evaluate our methods using both a clean reference implementation of Kyber and the ARM-optimized pqm4 library. For the reference implementation, we target the leakage of the output of the inverse NTT computation and recover the full key with only four traces. For the pqm4 implementation, we develop a message-recovery attack that leads to extraction of the full secret-key with between eight and 960 traces (or 184 traces for recovering 98% of the secret-key), depending on the compiler optimization level. We discuss the relevance of our findings to other lattice-based schemes and explore potential countermeasures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Objectives: The primary goal of this dataset is to enable the automated segmentation and quantification of atherosclerotic plaque features in OCT images. Cardiovascular disease, with atherosclerosis at its core, remains a global health challenge. Accurate identification of vulnerable plaques is crucial for preventing acute cardiovascular events such as myocardial infarction and stroke. OCT imaging provides high-resolution insights into plaque morphology but is often constrained by manual interpretation challenges. This dataset, curated with diverse annotations of key plaque morphological features, aims to facilitate the development and evaluation of machine learning models for precise plaque analysis. By advancing segmentation capabilities, this dataset contributes to improved diagnostics and therapeutic strategies in cardiovascular care.
Ethical Approval: The dataset complies with ethical standards, adhering to the Declaration of Helsinki. Ethical approval was granted by the Local Ethical Committee of the Research Institute for Complex Issues of Cardiovascular Diseases (Kemerovo, Russia) under protocol code 2022/06 (approved on June 30, 2022). All participants provided informed consent. Data collection involved patients aged 18 years or older, ensuring balanced gender representation and inclusion of various comorbid conditions for comprehensive clinical relevance (refer to Table 1).
Description: The dataset consists of OCT images acquired from 103 patients across two cardiovascular research centers. These images, collected over one year, represent a diverse array of imaging devices and patient demographics. The dataset includes 25,698 annotated slices, each capturing key plaque morphological features. These features include lumen (LM), fibrous cap (FC), lipid core (LC), and vasa vasorum (VV). The images vary in dimensions from 704 x 704 to 1024 x 1024 pixels, reflecting differences in anatomical characteristics and imaging conditions. Annotations were performed using Supervisely, with meticulous double-verification processes to ensure accuracy.
Annotation Method: Two cardiologists annotated the dataset, identifying plaque features using binary masks. The annotations underwent a review and double-verification by a senior cardiologist and technical specialist, enhancing precision and consistency. The morphological features segmented include the vascular lumen, fibrous cap, lipid core, and vasa vasorum, each providing critical insights into plaque stability and cardiovascular risk.
Dataset Split: A 5-fold cross-validation technique was employed for dataset splitting, ensuring robust model evaluation while preventing data leakage. Approximately 80% of images were allocated for training in each fold, with the remaining 20% reserved for testing (refer to Table 2). This method allowed a balanced and comprehensive assessment of segmentation performance across the dataset.
Access to the Study: Further information about this study, including curated source code, dataset details, and trained models, can be accessed through the following repositories:
Table 1. Baseline characteristics of patients included in the study.
Parameter |
Value |
Sex: |
|
Male, n (%) |
77 (74.7) |
Female, n (%) |
26 (25.3) |
Median Age, years [min – max] |
69 [43 – 83] |
Arterial hypertension, n (%) |
92 (89.3) |
Diabetes Mellitus, n (%) |
22 (21.4) |
Myocardial Infarction, n (%) |
22 (21.4) |
Polyvascular Disease, n (%) |
29 (28.2) |
Angina Pectoris: |
|
Silent ischemia, n (%) |
9 (8.7) |
Functional class 1, n (%) |
24 (23.3) |
Functional class 2, n (%) |
55 (53.4) |
Functional class 3, n (%) |
15 (14.6) |
Table 2. Image and plaque morphological feature distributions across folds and subsets.
Fold | Subset | LM | FC | LC | VV | Total objects | Total images |
1 | Train | 17264 | 5610 | 5576 | 328 | 28778 | 16901 |
1 | Test | 4544 | 1616 | 1616 | 122 | 7898 | 4492 |
2 | Train | 17554 | 5709 | 5690 | 237 | 29190 | 17207 |
2 | Test | 4254 | 1517 | 1502 | 213 | 7486 | 4186 |
3 | Train | 17220 | 5600 | 5565 | 407 | 28792 | 16962 |
3 | Test | 4588 | 1626 | 1627 | 43 | 7884 | 4431 |
4 | Train | 17813 | 5724 | 5686 | 416 | 29639 | 17473 |
4 | Test | 3995 | 1502 | 1506 | 34 | 7037 | 3920 |
5 | Train | 17381 | 6261 | 6251 | 412 | 30405 | 17029 |
5 | Test | 4427 | 965 | 941 | 38 | 6371 | 4364 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full 6.5-day CW measurement for Drift Detection
Experiment:
-The dataset was captured using the Chipwhisperer CW308, with the target device being STM32F3 with an ARM Cortex-M4
-The target device performed an AES-128 encryption while we measured the leakage traces
-The experiment lasted for approximately 6.5 days
-The data is organized in parts of 100k traces each. Each 100k-sized part was captured in approx. 38 minutes. Each trace has 5k time samples (features). The original experiment has a total of 254 parts of 100k traces each.
-Every 100k-trace data part is called tracesi.mat and comes together with labeli.mat, for indexes i = 1, 2, ..., 254
-The labeli.mat is the value of a single sboxoutput of AES-128 i.e. the label ranges in the set {0,1,...,255}. We assume that successfully recovering the sboxoutput implies successfully recovering the respective key byte of AES-128.
We also have a reduced version of the dataset available here:
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This data repository contains supplmentary datasets to the manuscript "Increasing emissions of HCFC-123 and HCFC-124 may be due to leakage during HFC-125 production" by Western et al., submitted to ACP.
There are six supplementary data sets in this folder:
1) factorylocations_2024_translated.pdf contains a translation of production data and companies producing HFC-125 (and other HFCs) in China. The original citation is "Ministry of Ecology and Environment of the People’s Republic of China: 2024 Hydrofluorocarbon Production and Import Quotas Announced [in Chinese], https://www.mee.gov.cn/xxgk2018/xxgk/xxgk05/202402/W020240318587718307908.pdf, 2024".
2) GlobalEmissions.zip contains the global emissions and mole fraction trends derived from the 12-box model used in this work. This is for HCFC-123 and HCFC-124 for both the NOAA and AGAGE networks. Inputs to the 12-box model inversion scripts are also included, which can be found at https://github.com/mrghg/py12box_invert (most recent version). These files are csv files.
3) HCFC124_bank_feedstock_modelling_3_31_2025.mat contains the outputs from the estimation of the separation of HCFC-124 emitted from banks and from HFC-125 production. This is a matlab file, but can be read using free software, such as Python.
4) US_HCFC-124 emissions.zip contains the estimated emissions of HCFC-124 for the USA and files containing information on the observations used to do this.
5) Europe.zip contains netcdf files output from the InTEM and RHIME models that were used to derive European emissions. Each model has 2 netfdf files, one which contains the output emissions and the other the input/output mole fractions.
6) EastAsia.zip contains the output files from the InTEM and RHIME models that were used to derive emissions from East Asia. InTEM outputs are text files and RHIME outputs are netcdf files.
The most recent observations from the AGAGE network can be found https://www-air.larc.nasa.gov/missions/agage/data/" target="_blank" rel="noopener">here and from the NOAA network can be found https://gml.noaa.gov/aftp/data/hats/" target="_blank" rel="noopener">here.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Concerns over mitigating methane leakage from the natural gas system have become ever more prominent in recent years. Recently, the U.S. Environmental Protection Agency proposed regulations requiring use of optical gas imaging (OGI) technologies to identify and repair leaks. In this work, we develop an open-source predictive model to accurately simulate the most common OGI technology, passive infrared (IR) imaging. The model accurately reproduces IR images of controlled methane release field experiments as well as reported minimum detection limits. We show that imaging distance is the most important parameter affecting IR detection effectiveness. In a simulated well-site, over 80% of emissions can be detected from an imaging distance of 10 m. Also, the presence of “superemitters” greatly enhance the effectiveness of IR leak detection. The minimum detectable limits of this technology can be used to selectively target “superemitters”, thereby providing a method for approximate leak-rate quantification. In addition, model results show that imaging backdrop controls IR imaging effectiveness: land-based detection against sky or low-emissivity backgrounds have higher detection efficiency compared to aerial measurements. Finally, we show that minimum IR detection thresholds can be significantly lower for gas compositions that include a significant fraction nonmethane hydrocarbons.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Instances in the Jacobaea vulgaris class: 895
Instances in the Meadow class: 9141
Image sizes from 77x77 to 817x817 pixels on three color channels (RGB)
The images in this dataset were taken as part of the project “UAV-basiertes Grünlandmonitoring auf Bestands- und Einzelpflanzenebene” (engl. “UAV-based Grassland Monitoring at Population and Individual Plant Level”), financed by the Authority for Economy, Transport, and Innovation of Hamburg.
In September 2018, flights with an octocopter were conducted over two extensively used grassland areas in the urban area of Hamburg. The multicopter flew in a height of circa 11 meters and took pictures with a ground resolution of approximately 3,18 mm/pixel. Additional information about the process of image generation for this dataset are to be found in the relevant papers written by P. Zacharias: 1) UAV-basiertes Grünland-Monitoring und Schadpflanzenkartierung mit offenen Geodaten [p. 45–53] and 2) UAV-basiertes Grünlandmonitoring auf Bestands- und Einzelpflanzenebene.
Additionally, to the images of Jacobaea vulgaris taken by the UAV, the dataset includes images of Jacobaea vulgaris plants from the internet (included in the total 895 images; e.g. images 'jkk0523.jpg', 'jkk0527.jpg'). Furthermore, some of the images of the Jacobaea vulgaris plants have been rotated, further cropped or a filter has been applied. The exact number of augmentations made is unknown. As there are augmented images included in the datasets -which makes the dataset useful for training and validation- a use of the dataset for testing purposes is not recommended due to the risk of data leakage.
The dataset is licensed under the license CC BY 4.0. The attributor of the data is the Chair of Geodesy and Geoinformatics at the University of Rostock. The data was created within the scope of the project 'UAV-based Grassland Monitoring at Population and Individual Plant Level', financed by the Authority for Economy, Transport, and Innovation of Hamburg.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Otter DUDe Dataset Card
Otter DUDe includes 1,452,568 instances of drug-target interactions.
Dataset details
DUDe
DUDe comprises a collection of 22,886 active compounds and their corresponding affinities towards 102 targets. For our study, we utilized a preprocessed version of the DUDe, which includes 1,452,568 instances of drug-target interactions. To prevent any data leakage, we eliminated the negative interactions and the overlapping triples with the TDC DTI… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/otter_dude.