Dataset for mimic4 data, by default for the Mortality task. Available tasks are: Mortality, Length of Stay, Readmission, Phenotype. The data is extracted from the mimic4 database using this pipeline: 'https://github.com/healthylaife/MIMIC-IV-Data-Pipeline/tree/main' mimic path should have this form : "path/to/mimic4data/from/username/mimiciv/2.2" If you choose a Custom task provide a configuration file for the Time series. Currently working with Mimic-IV ICU Data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preprocessed version of the MIMIC-II dataset. See https://github.com/theislab/ehrapy-datasets/tree/main/mimic_2
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Emergency department (ED) overcrowding leads to delayed care, increased patient risk, and inefficient resource use. The MIMIC-IV-Ext Triage Instruction Corpus (MIETIC) addresses this by providing 9,629 structured triage cases from MIMIC-IV, aligned with the Emergency Severity Index (ESI). MIETIC supports large language model (LLM) training for AI-assisted triage, improving accuracy, consistency, and risk assessment. The dataset includes chief complaints, vital signs, demographics, and medical history, ensuring realistic triage decision-making. Developed through automated quality control and expert validation, MIETIC enhances model performance in high-risk and moderate-risk classification. Available in CSV formats, MIETIC enables research in clinical NLP, AI-driven triage, and decision-support tools. The dataset module includes:
Structured triage cases with ESI labels. Triage case generation prompts for instruction tuning. Expert-validated samples for quality control. SQL scripts for data extraction and validation, hosted on GitHub.
MIETIC provides a standardized, reproducible dataset to advance AI-driven emergency triage, optimizing accuracy, efficiency, and resource allocation.
This dataset is a portion of MIMIC-III Clinical Database, a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The MIMIC-III demo provides researchers with an opportunity to review the structure and content of MIMIC-III before deciding whether or not to carry out an analysis on the full dataset. The full dataset is available on PhysioNet this** link**
This dataset contains solely 4 tables (extracted from the original dataset), more informations about each table can be found in its corresponding link
- admissions.csv
- d_labitems.csv
- labevents.csv
- patient.csv
a nice visualization of this dataset can be found here
This portion of the dataset will be combined to build a comprehensive dataset of simulated medical reports.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains input files necessary for runnig our analysis pipeline, as describrd in 'The evolutionary dynamics between viral mimics and host proteins" by Fuchs, Schor, Naim et al. The pipeline can be found at out github repository- "domain_mimicry" by HagaiLab: https://github.com/HagaiLab/domain_mimicry
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the "assessment and plan". This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. To structure these sections we collected a dataset of annotations over assessment and plan sections from the publicly available and de-identified MIMIC-III dataset, and developed deep-learning based models to perform this task, described in the associated paper available as a pre-print at: https://www.medrxiv.org/content/10.1101/2022.04.13.22273438v1
When using this data please cite our paper:
@article {Stupp2022.04.13.22273438,
author = {Stupp, Doron and Barequet, Ronnie and Lee, I-Ching and Oren, Eyal and Feder, Amir and Benjamini, Ayelet and Hassidim, Avinatan and Matias, Yossi and Ofek, Eran and Rajkomar, Alvin},
title = {Structured Understanding of Assessment and Plans in Clinical Documentation},
year = {2022},
doi = {10.1101/2022.04.13.22273438},
publisher = {Cold Spring Harbor Laboratory Press},
URL = {https://www.medrxiv.org/content/early/2022/04/17/2022.04.13.22273438},
journal = {medRxiv}
}
The dataset, presented here, contains annotations of assessment and plan sections of notes from the publicly available and de-identified MIMIC-III dataset, marking the active problems, their assessment description, and plan action items. Action items are additionally marked as one of 8 categories (listed below). The dataset contains over 30,000 annotations of 579 notes from distinct patients, annotated by 6 medical residents and students.
The dataset is divided into 4 partitions - a training set (481 notes), validation set (50 notes), test set (48 notes) and an inter-rater set. The inter-rater set contains the annotations of each of the raters over the test set. Rater 1 in the inter-rater set should be regarded as an intra-rater comparison (details in the paper). The labels underwent automatic normalization to capture entire word boundaries and remove flanking non-alphanumeric characters.
Code for transforming labels into TensorFlow examples and training models as described in the paper will be made available at GitHub: https://github.com/google-research/google-research/tree/master/assessment_plan_modeling
In order to use these annotations, the user additionally needs to obtain the text of the notes which is found in the NOTE_EVENTS table from MIMIC-III, access to which is to be acquired independently (https://mimic.mit.edu/)
Annotations are given as character spans in a CSV file with the following schema:
Field | Type | Semantics |
partition | categorical (one of [train, val, test, interrater] | The set of ratings the span belongs to. |
rater_id | int | Unique id for each the raters |
note_id | int | The note’s unique note_id, links to the MIMIC-III notes table (as ROW-ID). |
span_type | categorical (one of [PROBLEM_TITLE, PROBLEM_DESCRIPTION, ACTION_ITEM] | Type of the span as annotated by raters. |
char_start | int | Character offsets from note start |
char_end | int | |
action_item_type | categorical (one of [MEDICATIONS, IMAGING, OBSERVATIONS_LABS, CONSULTS, NUTRITION, THERAPEUTIC_PROCEDURES, OTHER_DIAGNOSTIC_PROCEDURES, OTHER]) | Type of action item if the span is an action item (empty otherwise) as annotated by raters. |
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset tracks the updates made on the dataset "Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II)" as a repository for previous versions of the data and metadata.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These embeddings were generated and studied in the paper Assessing the Effectiveness of Embedding Methods in Capturing Clinical Information from SNOMED CT () and more information can also be found in the following repository: https://github.com/JavierCastellD/AssessingSNOMEDEmbeddings.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing arterial blood pressure (ABP) signals and their corresponding finger photoplestimography (PPG). This dataset is a processed version of the MIMIC-III Waveform Database Matched Subset.
File names were inherited from MIMIC-III. Files are saved in ".mat" and each file contains 2 structures with raw signals and different computed characteristics. Each structure corresponds to 15-second segments sampled at 125Hz.
For more details, please refer to MIMIC-III Waveform Database Matched Subset and the processing source code.
This supplementary material accompanies:
Charlton P.H. et al., "An impedance pneumography signal quality index for respiratory rate monitoring: design, assessment and application", [under review], 2020
The Impedance Pneumography Signal Quality Index (SQI) dataset and accompanying scripts (in Matlab format) are provided to facilitate reproduction of the analyses using data from the MIMIC III dataset in this publication.
Summary of Publication
In this article we developed and assessed the performance of a signal quality index (SQI) for the impedance pneumography signal. The SQI was developed using data from the Listen dataset, and assessed using data from the Listen dataset and MIMIC III datasets. The SQI was found to accurately classify segments of impedance pneumography signal as either high or low quality. Furthermore, when it was coupled with a high performance RR algorithm, highly accurate and precise RRs were estimated from those segments deemed to be high quality. In this study performance was assessed in the critical care environment - further work is required to deteremine whether the SQI is suitable for use with wearable sensors. Both the dataset and code used to perform this study are publicly available.
Reproducing this Publication
The work relating to the MIMIC dataset in this publication can be reproduced as follows:
Reproducing the analysis These steps can be used to quickly reproduce the analysis using the curated and annotated dataset.
Download the curated and annotated dataset from Zenodo using this direct download link.
Run the analysis using the run_imp_sqi_mimic.m script.
Use the ImP_SQI_mimic_data_importer.m script to download raw MIMIC data files from PhysioNet, and collate them into a single Matlab file.
Prepare the dataset for manual annotation by running the run_imp_sqi_mimic.m script.
Manually annotate the signals by running the run_mimic_imp_annotation.m script - the annotations are stored in separate files (the original annotation files are available here).
Import the manual annotations into the collated data file by re-running the ImP_SQI_mimic_data_importer.m script.
Run run_imp_sqi_mimic.m to perform the analysis described in the publication.
The scripts are also stored (alongside details of how to use them) are available in the RRest GitHub repository at: https://github.com/peterhcharlton/RRest/tree/master/RRest_v3.0/Publication_Specific_Scripts/ImP_SQI
License: The dataset (mimic_imp_sqi_data.mat) is distributed under the terms specified in the accompanying LICENSE file. The scripts are distributed under the GNU General Public Licence (as specified towards the start of each file).
Version 0.1.1: This is the version at the time of initial submission.
The objective of this Bioengineering Research Partnership is to focus the resources of a powerful interdisciplinary team from academia (MIT), industry (Philips Medical Systems) and clinical medicine (Beth Israel Deaconess Medical Center) to develop and evaluate advanced ICU patient monitoring systems that will substantially improve the efficiency, accuracy and timeliness of clinical decision making in intensive care.
Components
Datasets : dataset_pre : Pre menopause biomarker levels chip reading dataset_post : Post menopause biomarker levels chip reading dataset_straged : biomarker levels chip reading distributed into cancer stages
Description
Synthetic dataset created to mimic biomarker activity on a designed chip for early stage cancer detection More details on github
Repository
https://github.com/SoumilB7/Ova-sense
license: mit
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
This repository contains images, 3D animated spaces, 2D perceptual maps with GMM, and mimicry ring lists for heliconiine butterflies complementing the analyses presented in this research paper: "Doré et al., 2025 - Perceptual maps reveal rampant convergence in butterfly wing patterns across the Neotropics. in prep.".
Abstract
In 1879, Fritz Müller formulated the first mathematical evolutionary model to explain mutualistic mimicry between coexisting defended prey. Yet, the degree to which local mimicry drives the structure of prey aposematic signals at continental scale remains unclear, because the perception of pattern similarity has never been assessed at large spatial scale. Here, we implement a Citizen Science survey to quantify and analyze the structure of perceived variation in the wing patterns of heliconiine butterflies (Nymphalidae: Heliconiini) throughout the entire Neotropics. Despite a continuum of perceived wing patterns at the continental scale, we show that the convergence of sympatric species into discrete mimicry rings is ubiquitous among communities. These results expand Müller’s historical predictions by supporting the rampant convergence of prey signals across an entire continent.
Contents
This repository contains three folders:
"3D_maps" contains the animated 3D perceptual spaces of heliconiine wing patterns for the Citizen Science dataset (N = 432) and the Local reference for the five local communities highlighted in the article.
"Clustering" contains the 2D perceptual maps and associated lists of mimicry rings built for each of the five local communities, for different level of clustering from GMM (K from 5 to 10).
"Images" contains the 432 images of dorsal wing patterns of heliconiine butterflies used in the online survey (https://memometic.cleverapps.io/) designed for this study.
How to cite
Please cite this research article as:
Doré, M., Pérochon, E., Aubier, T.G., Le Poul, Y., Joron, M., Elias, M., 2025. Perceptual maps reveal rampant convergence in butterfly wing patterns across the Neotropics. in prep. https://doi.org/TBA
Associated ressources
The source codes for the analyses carried out in the study are available on GitHub. The occurrences data and distribution maps used in this study are publicly available from Zenodo: Occurrences data at https://doi.org/10.5281/zenodo.10906853; Distribution maps at https://doi.org/10.5281/zenodo.10903661.
The online Citizen Science survey on the perception of mimicry in wing color patterns of heliconiine butterflies is temporary available at https://memometic.cleverapps.io/.Source code for the online Citizen Science survey are accessible on GitHub.
👂💉 EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.
⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab
⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.
1. 📖 Overview
EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:
%3C!-- --%3E
2. 💽 Dataset
EHRSHOT is sourced from Stanford’s STARR-OMOP database.
%3C!-- --%3E
We provide two versions of the dataset:
%3C!-- --%3E
To access the raw data, please see the "Tables" and "Files"** **tabs above:
3. 💽 Data Files and Formats
We provide EHRSHOT in two file formats:
%3C!-- --%3E
Within the "Tables" tab...
1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.
Within the "Files" tab...
1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E
* Dataset Version: EHRSHOT-Original
* Data Format: FEMR 0.1.16
* Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.
2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E
* Dataset Version: EHRSHOT-Original
* Data Format: MEDS 0.3.3
* Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.
3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8
* Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.
4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8
* Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.
4. 🤖 Model
We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base
**5. 🧑💻 Code **
Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/
**NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **
Access to the EHRSHOT dataset requires the following:
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MedQA-USMLE — A Large-scale Open Domain Question Answering Dataset from Medical Exams
Dataset Description
Links
Homepage: Github.io
Repository: Github
Paper: arXiv
Leaderboard: Papers with Code
Contact (Original Authors): Di Jin (jindi15@mit.edu)
Contact (Curator): Artur Guimarães (artur.guimas@gmail.com)
Dataset Summary
MedQA is a large-scale multiple-choice question-answering dataset designed to mimic the style of professional… See the full description on the dataset page: https://huggingface.co/datasets/araag2/MedQA.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for RaTE-NER Dataset
GitHub | Paper
Dataset Summary
RaTE-NER dataset is a large-scale, radiological named entity recognition (NER) dataset, including 13,235 manually annotated sentences from 1,816 reports within the MIMIC-IV database, that spans 9 imaging modalities and 23 anatomical regions, ensuring comprehensive coverage. Additionally, we further enriched the dataset with 33,605 sentences from the 17,432 reports available on Radiopaedia, by… See the full description on the dataset page: https://huggingface.co/datasets/Angelakeke/RaTE-NER.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains intermediate output files required to reproduce the figures in the main text and Supporting Material of Willink et al. 2023. The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies.
FILE OVERVIEW:
1. Morph-specific assemblies
A. File names: Afem_1354_ragtag.fasta.gz, Ifem_1049_ragtag.fa.gz, Ofem_0081_ragtag.fa.gz, O054_Shasta_run2.PMDV.HAP1.purged.fasta.gz, A059_Shasta_run1.PMDV.HAP1.purged.fa.gz
B. Description: genome assemblies for different morphs of Ischnura elegans (Afem_1354, Ifem_1049, and Ofem_0081) and Ischnura senegalensis (A059 and O054), generated in this study from long-read Nanopore data using Shasta v 0.7.0 (https://github.com/paoloshasta/shasta).
2. Assembly statistics
A. File names: Assembly_statistics.csv, Assembly_statistics_sen.csv
B. Description: Completeness and quality metrics for de novo genome assemblies of I. elegans and I. senegalensis female morphs. See Fig. S1-S2.
3. Repetitive content annotation
A. File names: A1354_ragtag_RED.bed.repeats.bed.gz, Afem_Shasta1_polished_ragtag_UPPER.fa.out.gz, Ifem_Shasta2_polished_ragtag_UPPER.fa.out.gz, ioIscEleg1.1.primary_UPPER.fa.out.gz, ToL_RED.repeats.bed.gz
B. Description: Annotation of repetitive sequences in morph-specific assemblies. All morph assemblies (A, I and Darwin Tree of Life assemblies) were annotated using RepeatModeler v 2.0.1 and RepeatMasker v 1.0.93 (http://www.repeatmasker.org). The A morph and DToL assemblies were additionally annotated using Red v 0.0.1 (https://github.com/BioinformaticsToolsmith/Red). RepeatMasker annotations were then used to estimate TE coverage. See Extended Data Fig. 4 and Fig. S7.
4. GWAS output
A. File names: A1354_ragtag_AvI.assoc_filtered.txt.gz, A1354_ragtag_AvO.assoc_filtered.txt.gz, A1354_ragtag_IvO.assoc_filtered.txt.gz, ToL_AvI.assoc_filtered.txt.gz, ToL_AvO.assoc_filtered.txt.gz, ToL_IvO.assoc_filtered.txt.gz
B. Description: filtered SNPs in pairwise association tests between morphs (n = 19 resequencing samples per morph) of I. elegans. Analyses were conducted in PLINK v 1.9 (http://pngu.mgh.harvard.edu/purcell/plink/), using either the A morph assembly (Fig. 2a-b), or the Darwin Tree of Life (DToL) reference assembly (Extended Data Figure 8a-b) as mapping reference.
5. Population statistics
A. File names: Afem_pixy_30K_fst.txt.gz, A1354_30kb.Tajima.D.gz, Afem_pi_30K_pi.txt.gz, ToL_30K_fst.txt.gz, ToL_30kb.Tajima.D.gz, ToL_30K_onepop_pi.txt.gz
B. Description: Genetic differentiation (fst) between morphs, Tajima's D statistics, and nucleotide diversity across 30 kb windows of the I. elegans genome. Population statistics were computed using either the A morph assembly (Fig. 2c-e), or the DToL reference assembly (Extended Data Figure 8c-e) as mapping reference.
6. k-mer based GWAS
A. File names: AvI_kmers.fa.gz, AvO_kmers.fa.gz, OvAI_kmers.fa.gz, AvI_kmers.fa_v_A1354_Shasta_run1_table.tsv.gz, AvO_kmers.fa_v_A1354_Shasta_run1_table.tsv.gz, OvAI_kmers.fa_v_A1354_Shasta_run1_table.tsv.gz, OvAI_kmers.fa_v_Ifem_1049_ragtag_table.tsv.gz
B. Description: List of significant k-mers (in fasta format) in three k-mer based association analyses (n = 19 resequencing samples per morph) between morphs of I. elegans. Significant k-mers were then mapped to morph-specific assemblies using Blast v 2.22.28 (https://blast.ncbi.nlm.nih.gov/Blast.cgi) for short sequences. We include mapping results shown in Fig. 3a-b.
7. Read-depth coverage
A. File names: reseq_coverage_norepeat_500_window.bed.gz, nano_coverage_norepeat_500_window.bed.gz, Ifem_nano_coverage_norepeat_500_window.bed.gz, Ifem_reseq_coverage_norepeat_500_window_15Mb.bed.gz, poolseq_coverage_norepeat_500_window.bed.gz, morph_coverage_norepeat_diff_500.tsv.gz, SwD_popmap
B. Description: Read depth coverage of the morph locus and a 15 mb region used to estimate baseline read depths. 19 Illumina resequencing samples, and one long-read Nanopore sample of each morph of I. elegans were mapped to both the A and I assemblies to estimate read depth. Two poolseq samples (each pool consisting of 30 females of each morph) of I. senegalensis were mapped to the A assembly of I. elegans to estimate read depth. Read depth was estimated in mosdepth v 0.2.8 (https://github.com/brentp/mosdepth) across 500 bp windows after filtering windows with more than 10% repetitive content. For poolseq samples, the difference in coverage values between the A and O pools was computed across the entire genome. Sample information for resequencing samples is recorded in the file SwD_popmap. See Fig. 3c-d, 5b, and S8.
8. Assembly alignment
A. File names: nucmer_aln_Ifem_1049_ragtag_Afem_1354_ragtag.qr1_filter.reformat.coords.gz, nucmer_aln_Ofem_0081_ragtag_Afem_1354_ragtag.qr1_filter.reformat.coords.gz, nucmer_aln_Afem_Isen_Afem_Iele.qr1_filter.reformat.coords.gz, nucmer_aln_Ofem_Isen_Afem_Iele.qr1_filter.reformat.coords.gz, karyotype_AI_RagTag.csv, karyotype_AO_RagTag.csv, karyotype_AIsen_AIele.cs, karyotype_OIsen_AIele.csv
B. Description: Assembly alignments using nucmer v 4.0.0 (https://github.com/mummer4/mummer) and contig synteny for plotting using RIdeogram v 0.2.2 (https://cran.r-project.org/web/packages/RIdeogram/vignettes/RIdeogram.html) in R v 4.2.2 (https://www.r-project.org/). The A morph assembly of I. elegans was aligned to the I and O morph assemblies of I. elegans and to the A and O-like assemblies of I. senegalensis. See Fig. 4a, 5c.
9. Genotyping the Darwin Tree of Life assembly
A. File names: nucmer_aln_Afem_ragtag_ToL-haplotigs.qr1_filter.reformat.coords.gz, nucmer_aln_Afem_ragtag_ToL-primary.qr1_filter.reformat.coords.gz, ToL_500_norepeat.regions.bed.gz, karyotype_AToL_13_unloc_RagTag.csv, karyotype_AToL_RagTag_haplotigs.csv
B. Description: To genotype the DToL reference assembly of I. elegans, we estimated read-depth coverage of the DToL long-read Pacbio data mapped to the A morph assembly of I. elegans generated in this study, and aligned the A morph assembly to both the primary DToL assembly and to the purged haplotigs. Read depth was estimated in mosdepth v 0.2.8 (https://github.com/brentp/mosdepth) and assembly alignments were conducted using nucmer v 4.0.0 (https://github.com/mummer4/mummer). See Fig. S3.
10. SV calling
A. File names: A_to_A.bam, A_to_A.bam.bai, A_to_I.bam, A_to_I.bam.bai, A_to_O.bam, A_to_O.bam.bai, A_to_ToL_2mb.bam, A_to_ToL_2mb.bam.bai, I_to_A.bam, I_to_A.bam.bai, I_to_I.bam, I_to_I.bam.bai, I_to_O.bam, I_to_O.bam.bai, I_to_ToL_2mb.bam, I_to_ToL_2mb.bam.bai, O_to_A.bam, O_to_A.bam.bai, O_to_I.bam, O_to_I.bam.bai, O_to_O.bam, O_to_O.bam.bai, O_to_ToL_2mb.bam, O_to_ToL_2mb.bam.bai
B. Description: mergede alignements of resequencing samples (n = 19 per morph) to alternative reference assemblies (A, I, O, and DToL) for I. elegans. The alignments have been filtered by quality and to contain only the unlocalized scaffold 2 of chromosome 13, which includes the morph locus. These files were used to call morph-specific structural variants using samplot v 1.3.0 (https://github.com/ryanlayer/samplot). See Extended Data Figs 2, 7, and Fig. S5-S6.
11. Mapping of inversion breakpoint reads
A. File names: AvO_3K.tsv.gz, AvO_22K.tsv.gz, AvO_sen_3K.tsv.gz, AvO_sen_22K.tsv.gz, IvO_3K.tsv.gz
B. Description: Signatures of an inversion with breakpoints at ~ 3 kb and ~ 22 kb of the unlocalized scaffold 2 of chromosome 13 on the O assembly were found in A and I resequencing samples of I. elegans and in poolseq samples of A females of I. senegalensis. We queried the reads mapping to the inversion breakpoints and then tabulated their mapping locations of the A morph assembly of I. elegans (Fig. 6 and Extended Data Fig. 3, 7b-c). For the first inversion breakpoint, we also mapped reads on the I morph assembly of I. elegans (Fig. S12).
12. Evidence of translocation in I
A. File names: Ifem_nano_SUPER_13_unloc_2.bam, Ifem_nano_SUPER_13_unloc_2.bam.bai
B. Description: Long-read Nanopore data of a I morph female of I. elegans mapped to the A morph of I. elegans and filtered to contain the entire unlocalized scaffold 2 of chromosome 13. Read mapping was conducted in minimap2 v 2.22-r1110 (https://github.com/lh3/minimap2) and used to identify a translocation signature in the I morph, relative to the A morph of I. elegans. See Extended Data Fig. 6.
13. PCA output
A. File names: A1354_all.eigenval, A1354_all.eigenvec, I1049_all.eigenval, I1049_all.eigenvec
B. Description: Eigenvectors and eigenvalues of PCA analyses of population structure between morphs of I. elegans. PCA analysis were conducted on morph locus, using either the A morph or the I morph assembly as mapping reference in PLINK v 1.9 (http://pngu.mgh.harvard.edu/purcell/plink/). See Fig. S4.
14. Linkage disequilibrium
A. File names: A1354_SUPER_1_allr.ld.gz, A1354_SUPER_2_allr.ld.gz, A1354_SUPER_3_allr.ld.gz, A1354_SUPER_4_allr.ld.gz, A1354_SUPER_5_allr.ld.gz, A1354_SUPER_6_allr.ld.gz, A1354_SUPER_7_allr.ld.gz, A1354_SUPER_8_allr.ld.gz, A1354_SUPER_9_allr.ld.gz, A1354_SUPER_10_allr.ld.gz, A1354_SUPER_11_allr.ld.gz, A1354_SUPER_12_allr.ld.gz, A1354_SUPER_13_allr.ld.gz, A1354_SUPER_13_unloc_1_allr.ld.gz, A1354_SUPER_13_unloc_2_allr.ld.gz, A1354_SUPER_13_unloc_3_allr.ld.gz, A1354_SUPER_13_unloc_4_allr.ld.gz, A1354_SUPER_X_allr.ld.gz
B. Description: Estimates of recombination rate (R2) between SNPs across the first 15 mb of each chromosome and unlocalized segments of chromosome 13 of
Components
Dataset : Emotional_perspectives : Response to a given context under 27 emotional lenses
Description
Synthetic dataset created to mimic emotional responses primarily made for alignment and interpretability research More details will be listed on github soon
license: mit
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description: The current dataset provides all the stimuli (folder ../01-Stimuli/), raw data (folder ../02-Raw-data/) and post-processed data (../03-Post-proc-data/) used in the Forum Acusticum 2013 paper titled "Using auditory models to mimic human listeners in reverse correlation experiments from the fastACI toolbox" by the same authors. In this paper, we replicated the tone-in-noise experiment by Ahumada et al. (1975) but using an artificial listener instead of collecting data from real participants. The behavioural data were mimicked using an artificial listener based on 'king2019' (King et al., 2019) as a front-end model using a template-matching decision to indicate whether a 500-Hz tone was (or not) present in each of the noisy trials. This study offers a step-by-step guide of how can be an artificial listener integrated into fastACI.
Use these data: Download all these data, locate them in a local directory of your computer. If you have MATLAB and you downloaded a local copy of the fastACI toolbox (open access at: https://github.com/aosses-tue/fastACI) you can recreate the figures of our paper. After downloading and initialising the toolbox (type 'startup_fastACI;', without quotation marks in MATLAB), run the script g20230501_FA_Artificial_listener_paper_figs.m (provided in this dataset) and follow the instructions on the screen to generate one of the four study figures. This script calls the function publ_osses2023b_FA_figs.m from the toolbox.
Dataset for mimic4 data, by default for the Mortality task. Available tasks are: Mortality, Length of Stay, Readmission, Phenotype. The data is extracted from the mimic4 database using this pipeline: 'https://github.com/healthylaife/MIMIC-IV-Data-Pipeline/tree/main' mimic path should have this form : "path/to/mimic4data/from/username/mimiciv/2.2" If you choose a Custom task provide a configuration file for the Time series. Currently working with Mimic-IV ICU Data.