CRAG-MM: Comprehensive multi-modal, multi-turn RAG Benchmark
This repository contains the CRAG-MM dataset, a high-quality conversational benchmark for multimodal assistants. The dataset features conversations about images with varied complexity levels, designed to evaluate AI systems' visual understanding and conversational abilities. CRAG-MM is a visual question-answering benchmark that focuses on factual questions, offering a unique collection of image and question-answering sets… See the full description on the dataset page: https://huggingface.co/datasets/crag-mm-2025/crag-mm-multi-turn-debug-public.
Under Institutional Review Board (IRB) supervision, 50 abdomen CT scans of were randomly selected from a combination of an ongoing colorectal cancer chemotherapy trial, and a retrospective ventral hernia study. The 50 scans were captured during portal venous contrast phase with variable volume sizes (512 x 512 x 85 - 512 x 512 x 198) and field of views (approx. 280 x 280 x 280 mm3 - 500 x 500 x 650 mm3). The in-plane resolution varies from 0.54 x 0.54 mm2 to 0.98 x 0.98 mm2, while the slice thickness ranges from 2.5 mm to 5.0 mm. The standard registration data was generated by NiftyReg.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
== Introdution ==
For many years PET centres around the world have developed and optimised their own analysis pipelines, including a mixture of in-house and independent software, and have implemented different modelling choices for PET image processing and data quantification. As a result, many different methods and tools are available for PET image analysis.
== Aim of the dataset ==
This dataset aims to provide a normative tool to assess the performance and consistency of PET modelling approaches on the same data for which the ground truth is known. It was created and released for the NRM2018 PET Grand Challenge. The challenge aimed at evaluating the performances of different PET analysis tools to identify areas and magnitude of receptor binding changes in a PET radioligand neurotransmission study.
The present dataset refers to 5 simulated human subjects scanned twice. For each subject the first PET scan (ses-baseline) represents baseline conditions; the second scan (ses-displaced) represents the scan after a pharmacological challenge in which the tracer binding has been displaced in certain regions of interest. A total of 10 dynamic scans are provided in the current dataset.
The nature of the neuroreceptor tracer used for the simulation (hereafter referred to as [11C]LondonPride) wants to be as general as possible. Any similarity to real PET tracer uptake is purely coincidental. Each simulated scan consists of a 90 minutes dynamic PET acquisition after bolus tracer injection as obtained with a Siemens Biograph mMR PET/MR scanner. The data were simulated including attenuation, randoms and scatters effects, the decay of the radiotracer and considering the geometry and resolution of the scanner. PET data can be considered motion-free as no motion or motion-related artifacts are included in the simulated dataset. The data were binned into 23 frames: 4×15 s, 4×60 s, 2×150 s, 10×300 s and 3×600 s. Each frame was reconstructed with the MLEM algorithm with 100 iterations. The reconstructed images available in the dataset are already decay corrected.
All provided PET images are already normalised in standard MNI space (182x218x182 – 1mm).
== Data simulation process ==
For the simulation of each of the 10 scans (5 patients, 2 scans each), time activity curves (TACs) for each voxel of the phantom were generated from the kinetic parameters using the 2TCM equations. The TACs had a resolution of 1 sec and included the effect of the radiotracer decay, which was simulated with a half-life of 20.34 min (11C half-life). Each voxel TAC was binned with the following framing: 4×15 s, 4×60 s, 2×150 s, 10×300 s and 3×600 s by using the mean activity value for each time frame. After this process, the dynamic phantom for each scan is ready to be used in the simulation of each scan. The phantoms had the same resolution as the parametric maps (1×1×1 mm^3).
Each scan was simulated with a total of 3×10e8 counts and by modelling the different physical effects of a PET acquisition. For each frame of a scan, the phantom was smoothed with a 2.5 mm FWHM kernel (lower than the spatial resolution of the mMR scanner since the phantom was already low resolution) and projected into a span 11 sinogram using the mMR scanner geometry. Then the resulting sinograms were multiplied by the attenuation factors, obtained from an attenuation map generated from the CT image of the patient, and by the normalization factors of the mMR scanner. Next, Poisson noise was introduced by simulating a random process for every sinogram bin, obtaining the sinogram with true events. A uniform sinogram multiplied by the normalization factors was used for the randoms and a smoothed version of the emission sinogram for the scatters, which were scaled in order to have 20% of randoms and 25% of scatters of the total counts. Poisson noise was introduced to randoms and scatters and added to the trues sinogram. Finally, each frame was individually reconstructed using the MLEM algorithm with 100 iterations, a 2.5 mm PSF and the standard mMR voxel size (2.09x2.09x2.03 mm3). The reconstructed images were corrected for the activity decay and resampled into the original MNI space. For the simulation and reconstruction, an in-house reconstruction framework was used (Belzunce and Reader 2017).
== Simulated Drug ===
The pharmacological challenge given to the subjects before the second scan (ses-displaced) is based, as is the tracer, on a simulated drug . Any similarity with existing drugs is purely coincidental. The drug has competitive binding to the radiotracer target and has no secondary affinities. The drug is simulated as given as a single oral bolus 30 min prior to the scan.
== Additional data in the folder ===
Along with the raw data, some additional derivatives data are provided. This data are 6 regions of displacements helpful for the quantification and analysis. Six regions of displacement have been manually generated (using ITKSnap) and applied consistently to all the subjects to generate displaced 𝑘3 parametric maps. Based on the neuroreceptor theory (Innis, Cunningham et al. 2007), any change in 𝑘3 would produce an equivalent change in BPnd. The regions volumes of the regions ranged from 343mm3 to 2275mm3 and were selected to be in regions of higher tracer uptake at baseline. None of the displacement ROIs has a purely geometrical (e.g. cube or sphere) or anatomical shape. The regions have been created to represent different sizes and different levels of tracer displacement according to the following values:
+----- ROI -----+----- Volume(mm^3) -----+----- Displacement (%) -----+
| ROI1 | 2555 | 27 |
| ROI2 | 2275 | 27 |
| ROI3 | 1152 | 21 |
| ROI4 | 493 | 18 |
| ROI5 | 343 | 18 |
| ROI6 | 418 | 18 |
+---------------+------------------------+----------------------------+
The ROIs are not symmetrically spatially distributed across the brain. A definintion of the ROI name can be found in the accompaning dseg.tsv file.
== References == - Belzunce, M. A. and A. J. Reader (2017). "Assessment of the impact of modeling axial compression on PET image reconstruction." Medical physics 44(10): 5172-5186. - Innis, R. B., V. J. Cunningham, J. Delforge, M. Fujita, A. Gjedde, R. N. Gunn, J. Holden, S. Houle, S. C. Huang, M. Ichise, H. Iida, H. Ito, Y. Kimura, R. A. Koeppe, G. M. Knudsen, J. Knuuti, A. A. Lammertsma, M. Laruelle, J. Logan, R. P. Maguire, M. A. Mintun, E. D. Morris, R. Parsey, J. C. Price, M. Slifstein, V. Sossi, T. Suhara, J. R. Votaw, D. F. Wong and R. E. Carson (2007). "Consensus nomenclature for in vivo imaging of reversibly binding radioligands." J Cereb Blood Flow Metab 27(9): 1533-1539.
== Appendix: Current Folder Contents ==
├── CHANGES ├── LICENSE ├── README ├── dataset_description.json ├── derivatives │ └── masks │ ├── dseg.tsv │ ├── sub-000101 │ │ ├── ses-baseline │ │ │ └── sub-000101_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000101_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000102 │ │ ├── ses-baseline │ │ │ └── sub-000102_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000102_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000103 │ │ ├── ses-baseline │ │ │ └── sub-000103_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000103_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000104 │ │ ├── ses-baseline │ │ │ └── sub-000104_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000104_ses-displaced_label-displacementROI_dseg.nii.gz │ └── sub-000105 │ ├── ses-baseline │ │ └── sub-000105_ses-baseline_label-displacementROI_dseg.nii.gz │ └── ses-displaced │ └── sub-000105_ses-displaced_label-displacementROI_dseg.nii.gz ├── participants.json ├── participants.tsv ├── sub-000101 │ ├── ses-baseline │ │ ├── anat │ │ │ ├── sub-000101_ses-baseline_acq-T1w.json │ │ │ └── sub-000101_ses-baseline_acq-T1w.nii.gz │ │ └── pet │ │ ├── sub-000101_ses-baseline_rec-MLEM_pet.json │ │ └── sub-000101_ses-baseline_rec-MLEM_pet.nii.gz │ └── ses-displaced │ ├── anat │ │ ├── sub-000101_ses-displaced_acq-T1w.json │ │ └── sub-000101_ses-displaced_acq-T1w.nii.gz │ └── pet │ ├── sub-000101_ses-displaced_rec-MLEM_pet.json │ └── sub-000101_ses-displaced_rec-MLEM_pet.nii.gz ├── sub-000102 │ ├── ses-baseline │ │ ├── anat │ │ │ ├── sub-000102_ses-baseline_acq-T1w.json │ │ │ └── sub-000102_ses-baseline_acq-T1w.nii.gz │ │ └── pet │ │ ├── sub-000102_ses-baseline_rec-MLEM_pet.json │ │ └── sub-000102_ses-baseline_rec-MLEM_pet.nii.gz │ └── ses-displaced │ ├── anat │ │ ├── sub-000102_ses-displaced_acq-T1w.json │ │ └──
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Train Dataset for the ICASSP-2024 3D-CBCT challenge (Part 5)
https://sites.google.com/view/icassp2024-spgc-3dcbct/home
Please download all files (of all parts, 1-5) into one folder, and then merge them together with:
$ zip -s 0 train.zip --out train_unsplit.zip
This will create a new zip file, "train_unsplit.zip" that then you can unzip with your favorite tool, e.g.
$ unzip train_unsplit.zip
The CBCT geometry required to be used
image size : [300 300 300] mm
image shape : [256 256 256] voxels
voxel size : [1.171875 1.171875 1.171875] mm
detector shape : [256 256] pixels
detector size : [600, 600] mm
pixel size : [2.34375, 2.34375] mm
distance source origin (axis of rotation, center of image) : 575 mm
distance source to detector : 1050 mm
We recommend zenodo_get to download the files:
https://github.com/dvolgyes/zenodo_get
Remember that to be part of the challenge you need to register in the webpage above.
The MM-WHS 2017 dataset is a dataset for multi-modality whole heart segmentation. It provides 20 labeled and 40 unlabeled CT volumes, as well as 20 labeled and 40 unlabeled MR volumes. In total there are 120 multi-modality cardiac images acquired in a real clinical environment.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The PHM Data Challenge is a competition open to all potential conference attendees. This year the challenge is focused on RUL estimation for a high-speed CNC milling machine cutters using dynamometer, accelerometer, and acoustic emission data.
Both Student and Professional teams are encouraged to enter! Winners of the Student and the Professional categories who attend the conference and submit an invited paper to ijPHM on their technique will be awarded a cash prize. Top scoring participants will be invited to present at a special session of the conference.
The PHM Data Challenge is a competition open to all potential conference attendees. This year the challenge is focused on RUL estimation for a high-speed CNC milling machine cutters using dynamometer, accelerometer, and acoustic emission data.
Both Student and Professional teams are encouraged to enter! Winners of the Student and the Professional categories who attend the conference and submit an invited paper to ijPHM on their technique will be awarded a cash prize. Top scoring participants will be invited to present at a special session of the conference.
Participants will be scored based on their ability to estimate the remaining useful life of a 6mm ball nose tungsten carbide cutter. Winners of the Student and the Professional categories who attend the conference and submit an invited paper to ijPHM on their technique will be awarded a cash prize. Top scoring participants will be invited to present at a special session of the conference.
Additional information can be found on the competition blog, http://www.phmsociety.org/forum/583
Teams Teams may be comprised of one or more researchers. One winner from each of two categories will be determined on the basis of score. The categories are:
Professional: open to anyone (including mixed teams)
Student: open to any team with all members enrolled as full time students during the spring or fall 2010 semesters.
Teams must declare what category they belong to when signing up. There is a cash prize of $1000 for the top entrant from each category, contingent upon:
attending the conference
giving an invited presentation on the winning technique
submitting a journal-quality paper to the International Journal of Prognostics and Health Management (ijPHM) which discloses the full algorithm used.
Additionally, top scoring teams will be invited to give presentations at a special session, and submit papers to ijPHM. Submission of the challenge special session papers is outside the regular paper submission process and follows its own schedule.
The organizers of the competition reserve the right to both modify these rules and disqualify any team at their discretion.
Registration Teams may register by contacting the Competition organizers with their name(s), a team alias under which the scores would be posted, affiliation(s) with address(es), contact phone number (for verification) and competition category (professional or student). Student teams should also send the name of the university and the semesters where they are enrolled full-time. You will be emailed your username and password after verification.
PLEASE NOTE: In the spirit of fair competition, we allow only one account per team. Please do not register multiple times under different user names, under fictitious names, or using anonymous accounts. Competition organizers reserve the right to delete multiple entries from the same person (or team) and/or to disqualify those who are trying to “game” the system or using fictitious identities.
Data There are six individual cutter records, c1…c6. Records c1, c4 and c6 are training data, and records c2, c3, and c5 are test data: cutter#1 cutter#2 cutter#3 cutter#4 cutter#5 cutter#6
The data files are ~800 MB each, and were compressed using the bZip2 algorithm. If your un-zipping software complains, make sure it is bZip2-compatible. 7-Zip is Windows open-source software that works well; Linux users can use the bunzip2 command; Mac users can use Stuffit.
Note that if you downloaded a copy of c3.zip with a wear file in it, this file is incorrect. Please discard it. The data acquisition files are OK.
Each training record contains one “wear” file that lists wear after each cut in 10^-3 mm, and a folder with approximately 300 individual data acquisition files (one for each cut). The data acquisition files are in .csv format, with seven columns, corresponding to: Column 1: Force (N) in X dimension Column 2: Force (N) in Y dimension Column 3: Force (N) in Z dimension Column 4: Vibration (g) in X dimension Column 5: Vibration (g) in Y dimension Column 6: Vibration (g) in Z dimension Column 7: AE-RMS (V)
Some background on the apparatus and experimental setup can be found here, and in the references in that paper. The spindle speed of the cutter was 10400 RPM; feed rate was 1555 mm/min; Y depth...
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The ProstateX dataset (both training and testing cases) have been included in the PI-CAI Public Training and Development dataset. As such, ProstateX as a benchmark has been deprecated and is superseded by the PI-CAI challenge. PI-CAI is an all-new grand challenge, with over 10,000 carefully-curated prostate MRI exams to validate modern AI algorithms and estimate radiologists' performance at clinically significant prostate cancer detection and diagnosis. Key aspects of the study design have been established in conjunction with an international, multi-disciplinary scientific advisory board (16 experts in prostate AI, radiology and urology) - to unify and standardize present-day guidelines, and to ensure meaningful validation of prostate-AI towards clinical translation. Please refer to https://pi-cai.grand-challenge.org for more information.
The PROSTATEx Challenge ("SPIE-AAPM-NCI Prostate MR Classification Challenge”) focused on quantitative image analysis methods for the diagnostic classification of clinically significant prostate cancers and was held in conjunction with the 2017 SPIE Medical Imaging Symposium. PROSTATEx ran from November 21, 2016 to January 15, 2017, though a "live" version has also been established at https://prostatex.grand-challenge.org which serves as an ongoing way for researchers to benchmark their performance for this task.
The PROSTATEx-2 Challenge ("SPIE-AAPM-NCI Prostate MR Gleason Grade Group Challenge" ) ran from May 15, 2017 to June 23, 2017 and was focused on the development of quantitative multi-parametric MRI biomarkers for the determination of Gleason Grade Group in prostate cancer. It was held in conjunction with the 2017 AAPM Annual Meeting (see http://www.aapm.org/GrandChallenge/PROSTATEx-2).
Supplemental data and instructions specific to both challenges are in the Detailed Description section below.
This collection is a retrospective set of prostate MR studies. All studies included T2-weighted (T2W), proton density-weighted (PD-W), dynamic contrast enhanced (DCE), and diffusion-weighted (DW) imaging. The images were acquired on two different types of Siemens 3T MR scanners, the MAGNETOM Trio and Skyra. T2-weighted images were acquired using a turbo spin echo sequence and had a resolution of around 0.5 mm in plane and a slice thickness of 3.6 mm. The DCE time series was acquired using a 3-D turbo flash gradient echo sequence with a resolution of around 1.5 mm in-plane, a slice thickness of 4 mm and a temporal resolution of 3.5 s. The proton density weighted image was acquired prior to the DCE time series using the same sequence with different echo and repetition times and a different flip angle. Finally, the DWI series were acquired with a single-shot echo planar imaging sequence with a resolution of 2 mm in-plane and 3.6 mm slice thickness and with diffusion-encoding gradients in three directions. Three b-values were acquired (50, 400, and 800), and subsequently, the ADC map was calculated by the scanner software. All images were acquired without an endorectal coil.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Proteome alterations in the metal-reducing bacterium Shewanella oneidensis MR-1 in response to different acute dose challenges (0.3, 0.5, or 1 mM) of the toxic metal chromate [Cr(VI)] were characterized with multidimensional HPLC−MS/MS. Proteome measurements were performed and compared on both quadrupole ion traps as well as linear trapping quadrupole mass spectrometers. We have found that the implementation of multidimensional liquid chromatography on-line with the rapid scanning, high throughput linear trapping quadrupole platform resulted in a dramatic increase in the number of measured peptides and, thus, the number of identified proteins. A total of 2406 functionally diverse, nonredundant proteins were identified in this study, representing a relatively deep proteome coverage for this organism. The core molecular response to chromate challenge under all three concentrations consisted predominantly of proteins with annotated functions in transport and binding (e.g., components of the TonB1 iron transport system, TonB-dependent receptors, and sulfate transporters) as well as a functionally undefined DNA-binding response regulator (SO2426) that might play a role in mediating metal stress responses. In addition, proteins annotated as a cytochrome c, a putative azoreductase, and various proteins involved in general stress protection were up-regulated at the higher Cr(VI) doses (0.5 and 1 mM) only. Proteins down-regulated in response to metal treatment were distributed across diverse functional categories, with energy metabolism proteins dominating. The results presented in this work demonstrate the dynamic dosage response of S. oneidensis to sub-toxic levels of chromate. Keywords: mass spectrometry • multidimensional liquid chromatography • shotgun proteomics • Shewanella oneidensis • linear trapping quadrupole • chromate stress
Training material for the MS lesion segmentation challenge 2008 to compare different algorithms to segment the MS lesions from brain MRI scans. Data used for the workshop is composed of 54 brain MRI images and represents a range of patients and pathology which was acquired from Children's Hospital Boston and University of North Carolian. Data has initially been randomized into three groups: 20 training MRI images, 24 testing images for the qualifying and 8 for the onsite contest at the 2008 workshop. The downloadable online database consists now of the training images (including reference segmentations) and all the 32 combined testing images (without segmentations). The naming has not been changed in comparison to the workshop compeition in order to allow easy comparison between the workshop papers and the online database papers. One dataset has been removed (UNC_test1_Case02) due to considerable motion present only in its T2 image (without motion artifacts in T1 and FLAIR). Such a dataset unfairly penalizes methods that use T2 images versus methods that don't use the T2 image. Currently all cases have been segmented by expert raters at each institution. They have significant intersite variablility in segmentation. MS lesion MRI image data for this competition was acquired seperately by Children's Hospital Boston and University of North Carolina. UNC cases were acquired on Siemens 3T Allegra MRI scanner with slice thickness of 1mm and in-plane resolution of 0.5mm. To ease the segmentation process all data has been rigidly registered to a common reference frame and resliced to isotrophic voxel spacing using b-spline based interpolation. Pre-processed data is stored in NRRD format containing an ASCII readable header and a separate uncompressed raw image data file. This format is ITK compatible. If you want to join the competition, you can download data set from links here, and submit your segmentation results at http://www.ia.unc.edu/MSseg after registering your team. They require team name, password, and email address for future contact. Once experiment is completed, you can submit the segmentation data in a zip file format. Please refer submission page for uploading data format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Original source from Kaggle : https://www.kaggle.com/c/trackml-particle-identification/data
The dataset comprises multiple independent events, where each event contains simulated measurements (essentially 3D points) of particles generated in a collision between proton bunches at the Large Hadron Collider at CERN. The goal of the tracking machine learning challenge is to group the recorded measurements or hits for each event into tracks, sets of hits that belong to the same initial particle. A solution must uniquely associate each hit to one track. The training dataset contains the recorded hits, their ground truth counterpart and their association to particles, and the initial parameters of those particles. The test dataset contains only the recorded hits.
Once unzipped, the dataset is provided as a set of plain .csv files. Each event has four associated files that contain hits, hit cells, particles, and the ground truth association between them. The common prefix, e.g. event000000010, is always event followed by 9 digits.
event000000000-hits.csv
event000000000-cells.csv
event000000000-particles.csv
event000000000-truth.csv
event000000001-hits.csv
event000000001-cells.csv
event000000001-particles.csv
event000000001-truth.csv
Event hits
The hits file contains the following values for each hit/entry:
hit_id: numerical identifier of the hit inside the event.
x, y, z: measured x, y, z position (in millimeter) of the hit in global coordinates.
volume_id: numerical identifier of the detector group.
layer_id: numerical identifier of the detector layer inside the group.
module_id: numerical identifier of the detector module inside the layer.
The volume/layer/module id could in principle be deduced from x, y, z. They are given here to simplify detector-specific data handling.
Event truth
The truth file contains the mapping between hits and generating particles and the true particle state at each measured hit. Each entry maps one hit to one particle.
hit_id: numerical identifier of the hit as defined in the hits file.
particle_id: numerical identifier of the generating particle as defined in the particles file. A value of 0 means that the hit did not originate from a reconstructible particle, but e.g. from detector noise.
tx, ty, tz true intersection point in global coordinates (in millimeters) between the particle trajectory and the sensitive surface.
tpx, tpy, tpz true particle momentum (in GeV/c) in the global coordinate system at the intersection point. The corresponding vector is tangent to the particle trajectory at the intersection point.
weight per-hit weight used for the scoring metric; total sum of weights within one event equals to one.
Event particles
The particles files contains the following values for each particle/entry:
particle_id: numerical identifier of the particle inside the event.
vx, vy, vz: initial position or vertex (in millimeters) in global coordinates.
px, py, pz: initial momentum (in GeV/c) along each global axis.
q: particle charge (as multiple of the absolute electron charge).
nhits: number of hits generated by this particle.
All entries contain the generated information or ground truth.
Event hit cells
The cells file contains the constituent active detector cells that comprise each hit. The cells can be used to refine the hit to track association. A cell is the smallest granularity inside each detector module, much like a pixel on a screen, except that depending on the volume_id a cell can be a square or a long rectangle. It is identified by two channel identifiers that are unique within each detector module and encode the position, much like column/row numbers of a matrix. A cell can provide signal information that the detector module has recorded in addition to the position. Depending on the detector type only one of the channel identifiers is valid, e.g. for the strip detectors, and the value might have different resolution.
hit_id: numerical identifier of the hit as defined in the hits file.
ch0, ch1: channel identifier/coordinates unique within one module.
value: signal value information, e.g. how much charge a particle has deposited.
Additional detector geometry information
The detector is built from silicon slabs (or modules, rectangular or trapezoïdal), arranged in cylinders and disks, which measure the position (or hits) of the particles that cross them. The detector modules are organized into detector groups or volumes identified by a volume id. Inside a volume they are further grouped into layers identified by a layer id. Each layer can contain an arbitrary number of detector modules, the smallest geometrically distinct detector object, each identified by a module_id. Within each group, detector modules are of the same type have e.g. the same granularity. All simulated detector modules are so-called semiconductor sensors that are build from thin silicon sensor chips. Each module can be represented by a two-dimensional, planar, bounded sensitive surface. These sensitive surfaces are subdivided into regular grids that define the detectors cells, the smallest granularity within the detector.
Each module has a different position and orientation described in the detectors file. A local, right-handed coordinate system is defined on each sensitive surface such that the first two coordinates u and v are on the sensitive surface and the third coordinate w is normal to the surface. The orientation and position are defined by the following transformation
pos_xyz = rotation_matrix * pos_uvw + translation
that transform a position described in local coordinates u,v,w into the equivalent position x,y,z in global coordinates using a rotation matrix and and translation vector (cx,cy,cz).
volume_id: numerical identifier of the detector group.
layer_id: numerical identifier of the detector layer inside the group.
module_id: numerical identifier of the detector module inside the layer.
cx, cy, cz: position of the local origin in the global coordinate system (in millimeter).
rot_xu, rot_xv, rot_xw, rot_yu, ...: components of the rotation matrix to rotate from local u,v,w to global x,y,z coordinates.
module_t: half thickness of the detector module (in millimeter).
module_minhu, module_maxhu: the minimum/maximum half-length of the module boundary along the local u direction (in millimeter).
module_hv: the half-length of the module boundary along the local v direction (in millimeter).
pitch_u, pitch_v: the size of detector cells along the local u and v direction (in millimeter).
There are two different module shapes in the detector, rectangular and trapezoidal. The pixel detector ( with volume_id = 7,8,9) is fully built from rectangular modules, and so are the cylindrical barrels in volume_id=13,17. The remaining layers are made out disks that need trapezoidal shapes to cover the full disk.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The early prediction of sepsis is potentially life-saving, and we challenge participants to predict sepsis 6 hours before the clinical prediction of sepsis. Conversely, the late prediction of sepsis is potentially life-threatening, and predicting sepsis in non-sepsis patients (or predicting sepsis very early in sepsis patients) consumes limited hospital resources. For the challenge, we designed a utility function that rewards early predictions and penalizes late predictions as well as false alarms.
Vital signs (columns 1-8) HR Heart rate (beats per minute) O2Sat Pulse oximetry (%) Temp Temperature (Deg C) SBP Systolic BP (mm Hg) MAP Mean arterial pressure (mm Hg) DBP Diastolic BP (mm Hg) Resp Respiration rate (breaths per minute) EtCO2 End tidal carbon dioxide (mm Hg)
Laboratory values (columns 9-34) BaseExcess Measure of excess bicarbonate (mmol/L) HCO3 Bicarbonate (mmol/L) FiO2 Fraction of inspired oxygen (%) pH N/A PaCO2 Partial pressure of carbon dioxide from arterial blood (mm Hg) SaO2 Oxygen saturation from arterial blood (%) AST Aspartate transaminase (IU/L) BUN Blood urea nitrogen (mg/dL) Alkalinephos Alkaline phosphatase (IU/L) Calcium (mg/dL) Chloride (mmol/L) Creatinine (mg/dL) Bilirubin_direct Bilirubin direct (mg/dL) Glucose Serum glucose (mg/dL) Lactate Lactic acid (mg/dL) Magnesium (mmol/dL) Phosphate (mg/dL) Potassium (mmol/L) Bilirubin_total Total bilirubin (mg/dL) TroponinI Troponin I (ng/mL) Hct Hematocrit (%) Hgb Hemoglobin (g/dL) PTT partial thromboplastin time (seconds) WBC Leukocyte count (count*10^3/µL) Fibrinogen (mg/dL) Platelets (count*10^3/µL)
Demographics (columns 35-40) Age Years (100 for patients 90 or above) Gender Female (0) or Male (1) Unit1 Administrative identifier for ICU unit (MICU) Unit2 Administrative identifier for ICU unit (SICU) HospAdmTime Hours between hospital admit and ICU admit ICULOS ICU length-of-stay (hours since ICU admit)
Outcome (column 41) SepsisLabel For sepsis patients, SepsisLabel is 1 if t≥tsepsis−6 and 0 if t
Attribution-ShareAlike 2.5 (CC BY-SA 2.5)https://creativecommons.org/licenses/by-sa/2.5/
License information was derived automatically
Dataset DLC-2021 consists of 1424 video clips captured in a wide range of real-world conditions and focused on ID document forensics tasks. Each clip was shot vertically and was at least 5 seconds long. Frames extracted at 10 frames per second and for the 50 first extracted frames document position is manually annotated. The novelty of the dataset is that it contains shots from video with color laminated mock ID documents, color unlaminated copies, grayscale unlaminated copies, and screen recaptures of the documents. The proposed dataset complies with the GDPR because it contains images of synthetic IDs with generated owner photos and artificial personal information.
Part 1 contains videos, frames and markup for “original” laminated documents from MIDV-2020 collection and unlaminated gray copies. Part 2 contains videos, frames and markup for documents recaptured from device screen Part 3 contains videos, frames and markup for unlaminated color copies.
Share and Cite
MDPI and ACS Style
Polevoy, D.V.; Sigareva, I.V.; Ershova, D.M.; Arlazarov, V.V.; Nikolaev, D.P.; Ming, Z.; Luqman, M.M.; Burie, J.-C. Document Liveness Challenge Dataset (DLC-2021). J. Imaging 2022, 8, 181. https://doi.org/10.3390/jimaging8070181
AMA Style
Polevoy DV, Sigareva IV, Ershova DM, Arlazarov VV, Nikolaev DP, Ming Z, Luqman MM, Burie J-C. Document Liveness Challenge Dataset (DLC-2021). Journal of Imaging. 2022; 8(7):181. https://doi.org/10.3390/jimaging8070181
Chicago/Turabian Style
Polevoy, Dmitry V., Irina V. Sigareva, Daria M. Ershova, Vladimir V. Arlazarov, Dmitry P. Nikolaev, Zuheng Ming, Muhammad M. Luqman, and Jean-Christophe Burie. 2022. "Document Liveness Challenge Dataset (DLC-2021)" Journal of Imaging 8, no. 7: 181. https://doi.org/10.3390/jimaging8070181
https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/
Adapted tolerant yeast strain Y-50049 is able to in situ detoxify furfural and HMF while the wild type control Y-12632 repressed to loss function under challenges of 20 mM each of furfural and HMF Overall design: A time course study during the lag phase with cells harvested at 18, 24, 28, and 42 h after 20 mM furfural and 20 mM HMF treatment
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Severe cold, defined as a damaging cold beyond acclimation temperatures, has unique responses, but the signaling and evolution of these responses are not well understood. Production of oligogalactolipids, which is triggered by cytosolic acidification in Arabidopsis (Arabidopsis thaliana), contributes to survival in severe cold. Here, we investigated oligogalactolipid production in species from bryophytes to angiosperms. Production of oligogalactolipids differed within each clade, suggesting multiple evolutionary origins of severe cold tolerance. We also observed greater oligogalactolipid production in control samples instead of temperature-challenged samples of some species. Further examination of representative species revealed a tight association between temperature, damage, and oligogalactolipid production that scaled with the cold tolerance of each species. Based on oligogalactolipid production and transcript changes, multiple species share a signal of oligogalactolipid production initially described in Arabidopsis, cytosolic acidification. Together, these data suggest that oligogalactolipid production is a severe cold response that originated from an ancestral damage response that remains in many land plant lineages and that cytosolic acidification is a common signaling mechanism for its activation.
Methods
Plant material and growth conditions
Arabidopsis (Arabidopsis thaliana, Columbia-0 [Col-0]) plants were grown on a mixture of Sungrow Propagation Mix soil and Turface at 22°C under a 16-h-light/8-h-dark photoperiod. Plants were grown for 3 to 4 weeks before cold acclimation at 6°C under a 12-h-light/12-h-dark photoperiod for 1 week. Sorghum (Sorghum bicolor ‘BTx623’) was grown in chambers under a 16-h-light/8-h-dark photoperiod with 29°C during the day and 22°C at night on standard greenhouse soil mix (8:8:3:1 [w/w/w/w] peat moss: vermiculite: sand: screened topsoil, with 7.5:1:1:1 [w/w/w/w/] Waukesha fine lime, Micromax, Aquagro, and Green Guard per cubic yard). Acclimation was carried out at 16°C for 1 week for sorghum. All plants for all treatments were moved into low-temperature stress at the end of their respective day for treatment in the dark.
Ion leakage
Plants used for ion leakage were grown as described above, and all plants were cold acclimated for 1 week at the appropriate temperature. Ion leakage was determined using a refrigerated circulator (AP15R-40, VWR, Radnor, PA, USA) with leaf pieces or punches floated onto 3 mL of ddH2O. For Arabidopsis, an entire leaf was used. For sorghum, three leaf punches of 8 mm in diameter were used. Samples were collected from the second true leaf, except for Arabidopsis, where rosette leaves were sampled, with care taken to use older, expanded leaves and avoid cotyledons. Different plants were then chilled to temperatures sufficient to induce stress in the respective species. Stress was imposed on Arabidopsis as previously described (Warren et al., 1996). Briefly, samples were exposed to an initial equilibration at 2°C for 30 min, nucleation was initiated with a ddH2O ice chip at –1°C for 1 h, and subsequent chilling occurred at a rate of –2°C/h (Figure 1B). Samples for sorghum were collected at temperatures from 0°C to –4°C. Following a 30-min equilibration at 0°C, samples were cooled from 0°C to –1°C at a rate of –2°C/h, and ice nucleation was initiated at –1°C and held for 1 h before subsequent chilling at a rate of –1°C/h from –1°C to –3°C and then –2°C/h from –3°C to –4°C (Figure 1B). The slower chilling between –1°C and –3°C allowed for sampling in 0.5°C steps. At each temperature point, an equivalent sample for lipid analysis and fractional TGDG accumulation was collected for all species. For Arabidopsis and sorghum, an additional and equivalent sample was also taken for transcriptome analysis for Arabidopsis at –2°C (control) and –7°C (challenge), for sorghum at –0.5°C (control) and –2.5°C (challenge).
Cytosolic acidification
All experiments were performed on excised leaves. For Arabidopsis, the leaf was placed into a cup of acid solution or water after having removed the leaf from the rosette of a full-sized, 3-week-old plant. For sorghum, the sorghum stalk above the soil surface was cut using a new razor blade for each plant and shoots were inserted into a tube containing 20 mM 2,4-dinitrophenol, pH 5, in 18.2% (v/v) methanol, adjusted with KOH, or into 18.2% (v/v) methanol/water as a control. These samples were immediately placed into a humidity chamber for 3 h with a minimum relative humidity of 84%. Leaf punches were taken from the second true leaf to mimic the samples used in ion leakage tests.
RNA-seq data generation and processing
Total RNA was isolated for each sample using a Zymo Quick-RNA Plant Mini-prep Kit (Zymo Research Corp, Irvine, CA, USA), and RNA-seq libraries were prepared according to Illumina TruSeq Sample Preparation V2 using 1 μg of starting total RNA. Libraries were sequenced using a 75-bp paired-end Illumina Miseq instrument at the University of Nebraska Medical Center. Raw reads were deposited in the NCBI SRA (Sequence Read Archive) database under the BioProject ID PRJNA894306. Trimmomatic 0.36 (Bolger et al., 2014) was used to remove low-quality reads and adapters using default parameters. The resulting clean reads from Arabidopsis and sorghum samples were aligned to the Arabidopsis thaliana TAIR10 and Sorghum bicolor v3.1 genomes, respectively (retrieved from Phytozome v12.0) using GSNAP (2018-03-25) (Wu & Nacu, 2010). Alignment files were converted to bam format using Samtools (v1.9) (Li et al., 2009) and used as input to HTSeq (0.6.1) (Anders et al., 2015) for generation of raw counts per gene.
Differentially expressed genes
The formula design ≅ Replicate + Condition in DESeq2 was used to identify differentially expressed genes (DEGs) for each species for both artificial acidification and severe cold treatment using DESeq2 (Love et al., 2014). Two and three biological replicates were employed for the identification of DEGs during temporal and chemical treatment. Any gene in the condition factor (Control vs. Treatment) with an adjusted p-value < 0.05 and absolute Log2 fold-change > 1 was classified as DEGs. Overall, four gene categories in each species were generated: upregulated by chemical treatment, upregulated by temperature treatment, downregulated by chemical treatment, and downregulated by temperature treatment. To identify differentially expressed orthologs between Arabidopsis and sorghum, a list of corresponding orthologs between each sorghum gene model and the best hit Arabidopsis gene models was retrieved from Phytozome v12.0. To compare with randomly overlapping orthologs in either treatment, only genes with 1:1 orthologs between sorghum and Arabidopsis were considered as background. Sorghum genes in each category were assigned to the best hit Arabidopsis gene models. To determine the significance of co-upregulated or co-downregulated orthologs between Arabidopsis and sorghum, we randomly picked the equal number of upregulated or downregulated genes in Arabidopsis respectively and tested the number of genes with orthologs in sorghum. The F test was used to determine significance between real overlapping ortholog numbers and permutated overlapping ortholog numbers. In total, 100 permutations were performed. Gene Ontology Analysis Lists of genes obtained from DEG analysis were analyzed via the Gene Ontology Resource (Ashburner et al., 2000; Mi et al., 2019). GO Enrichment Analysis was performed, and significantly enriched categories were reported (Gene Ontology Consortium, 2021).
Dihydrotestosterone (DHT) driven Androgen receptor activation in pancreatic beta-cells leads to the amplification of GLP-1R-mediated insulin exocytosis. Here we study the impact of DHT on the metabolic architecture of human pancreatic islets upon stimulation with DHT at high (16.7mM) and low (2.8mM) Glucose.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
MER2025 Dataset
Welcome to the official dataset for the MER2025 Challenge@ACM MM! MER2025 marks the third edition of our Multimodal Emotion Recognition (MER) series of challenges, aiming to bring together the affective computing community to explore emerging trends and future directions. This year, MER2025 focuses on the theme:
"Affective Computing Meets Large Language Models"
We seek to shift from traditional categorical emotion recognition frameworks, which rely… See the full description on the dataset page: https://huggingface.co/datasets/MERChallenge/MER2025.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
MER2025_personality is a subset of the MDPE dataset. For more details about MDPE, please refer to the MDPE dataset card or the paper MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics. This dataset serves as the testing set for MER25 Challenge @ ACM MM & MRAC25 Workshop @ ACM MM Emotion-enhanced Personality Recognition Track, with the MDPE as the training and validation sets. More details about the MER2025 competition can be found on the MER25 Website and MER25… See the full description on the dataset page: https://huggingface.co/datasets/MDPEdataset/MER2025_personality.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MRSA252 genes down-regulated following the addition of linoleic acid (0.1 mM) to exponentially growing cells (linoleic acid challenge).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Tailings recovery has been a constant challenge for most engineers. Along more than five years, GAUSTEC joined major players in the mining Industry to scavenge Iron from tailings produced by flotation making use of WHIMS (Wet High Intensity Magnetic Separation). In the early 1980s, in USA, the US 4,192,738 patent was granted with promising results. Despite this, thirty years have passed with no significant application worldwide. One main reason is reported: the market missed a really high feed capacity WHIMS in order to avoid the huge number of the WHIMS that were available at that time (such projects would typically require more than 20 WHIMS to scavenge iron from tailings produced by flotation plants). Such a huge asset to scavenge low grade iron tailings would not payback. The Mega sized WHIMS launched by GAUSTEC in 2014, the GHX-1400, improved by the Super-WHIMS Technology (18.000 Gauss) and BigFlow Magnetic Matrixes (Gaps smaller than 1.5 mm), faced this challenge. Specially designed ancillary equipment described here also played a decisive role in the scene.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MRSA252 proteins up-regulated following the addition of hexadecenoic acid (0.1 mM) to exponentially growing cells (hexadecenoic acid challenge).
CRAG-MM: Comprehensive multi-modal, multi-turn RAG Benchmark
This repository contains the CRAG-MM dataset, a high-quality conversational benchmark for multimodal assistants. The dataset features conversations about images with varied complexity levels, designed to evaluate AI systems' visual understanding and conversational abilities. CRAG-MM is a visual question-answering benchmark that focuses on factual questions, offering a unique collection of image and question-answering sets… See the full description on the dataset page: https://huggingface.co/datasets/crag-mm-2025/crag-mm-multi-turn-debug-public.