12 datasets found

Z
Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set
data.niaid.nih.gov
Updated Jul 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Walton, Sam D. (2022). Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6835136
Explore at:
Dataset updated
Jul 15, 2022
Dataset provided by
Murphy, Kyle R.
Walton, Sam D.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.

Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.

The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.

The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.
Z
AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and...
data.niaid.nih.gov
zenodo.org
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodríguez Miret, Jan (2024). AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and Normalization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14210424
Explore at:
Dataset updated
Nov 25, 2024
Dataset provided by
Krallinger, Martin
Marsol Torrent, Sergi
Rodríguez Miret, Jan
Rodríguez Ortega, Miguel
Farré-Maduell, Eulàlia
Becerra-Tomé, Alberto
Lima-López, Salvador
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises a professions gazetteer generated with automatically extracted terminology from the Mesinesp2 corpus, a manually annotated corpus in which domain experts have labeled a set of scientific literature, clinical trials, and patent abstracts, as well as clinical case reports.

A silver gazetteer for mention classification and normalization is created combining the predictions of automatic Named Entity Recognition models and normalization using Entity Linking to three controlled vocabularies SNOMED CT, NCBI and ESCO. The sources are 265,025 different documents, where 249,538 correspond to MESINESP2 Corpora and 15,487 to clinical cases from open clinical journals. From them, 5,682,000 mentions are extracted and 4,909,966 (86.42%) are normalized to any of the ontologies: SNOMED CT (4,909,966) for diseases, symptoms, drugs, locations, occupations, procedures and species; ESCO (215,140) for occupations; and NCBI (1,469,256) for species.

The repository contains a .tsv file with the following columns:

filenameid: A unique identifier combining the file name and mention span within the text. This ensures each extracted mention is uniquely traceable. Example: biblio-1000005#239#256 refers to a mention spanning characters 239–256 in the file with the name biblio-1000005.

span: The specific text span (mention) extracted from the document, representing a term or phrase identified in the dataset. Example: centro oncológico.

source: The origin of the document, indicating the corpus from which the mention was extracted. Possible values: mesinesp2, clinical_cases.

filename: The name of the file from which the mention was extracted. Example: biblio-1000005.

mention_class: Categories or semantic tags assigned to the mention, describing its type or context in the text. Example: ['ENFERMEDAD', 'SINTOMA'].

codes_esco: The normalized ontology codes from the European Skills, Competences, Qualifications, and Occupations (ESCO) vocabulary for the identified mention (if applicable). This field may be empty if no ESCO mapping exists. Example: 30629002.

terms_esco: The human-readable terms from the ESCO ontology corresponding to the codes_esco. Example: ['responsable de recursos', 'director de recursos', 'directora de recursos'].

codes_ncbi: The normalized ontology codes from the NCBI Taxonomy vocabulary for species (if applicable). This field may be empty if no NCBI mapping exists.

terms_ncbi: The human-readable terms from the NCBI Taxonomy vocabulary corresponding to the codes_ncbi. Example: ['Lacandoniaceae', 'Pandanaceae R.Br., 1810', 'Pandanaceae', 'Familia'].

codes_sct: The normalized ontology codes from SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) vocabulary for diseases, symptoms, drugs, locations, occupations, procedures, and species (if applicable). Example: 22232009.

terms_sct: The human-readable terms from the SNOMED CT ontology corresponding to the codes_sct. Example: ['adjudicador de regulaciones del seguro nacional'].

sct_sem_tag: The semantic category tag assigned by SNOMED CT to describe the general classification of the mention. Example: environment.

Suggestion: If you load the dataset using python, it is recommended to read the columns containing lists as follows

import ast

df["mention_class"] = df["mention_class"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)

License

This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). This means you are free to:

Share: Copy and redistribute the material in any medium or format.

Adapt: Remix, transform, and build upon the material for any purpose, even commercially.

Attribution Requirement: Please credit the dataset creators appropriately, provide a link to the license, and indicate if changes were made.

Contact

If you have any questions or suggestions, please contact us at:

Martin Krallinger ()

Additional resources and corpora

If you are interested, you might want to check out these corpora and resources:

MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)

MEDDOPROF corpus

Codes Reference List (for MEDDOPROF-NORM)

Annotation Guidelines

Occupations Gazetteer
P
nEMO Dataset
paperswithcode.com
Updated Apr 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iwona Christop (2024). nEMO Dataset [Dataset]. https://paperswithcode.com/dataset/nemo-1
Explore at:
Dataset updated
Apr 8, 2024
Authors
Iwona Christop
Description
Overview nEMO is a simulated dataset of emotional speech in the Polish language. The corpus contains over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language. The corpus is available for free under the Creative Commons license (CC BY-NC-SA 4.0).

The dataset is available on Hugging Face and GitHub.

Data Fields

file_id - filename, i.e. {speaker_id}_{emotion}_{sentence_id},

audio (audio) - dictionary containing audio array, path and sampling rate (available when accessed via datasets library),

emotion - label corresponding to emotional state,

raw_text - original (orthographic) transcription of the audio,

normalized_text - normalized transcription of the audio,

speaker_id - id of speaker,

gender - gender of the speaker,

age - age of the speaker.

Usage The nEMO dataset can be loaded and processed using the datasets library:

from datasets import load_dataset nemo = load_dataset("amu-cai/nEMO", split="train")

To work with the nEMO dataset on GitHub, you may clone the repository and access the files directly within the samples folder. Corresponding metadata can be found in the data.tsv file.

The nEMO dataset is provided as a whole, without predefined training and test splits. This allows researchers and developers flexibility in creating their splits based on the specific needs.

Supported Tasks

Audio classification: This dataset was mainly created for the task of speech emotion recognition. Each recording is labeled with one of six emotional states (anger, fear, happiness, sadness, surprised, and neutral). Additionally, each sample is labeled with speaker id and speaker gender. Because of that, the dataset can also be used for different audio classification tasks. Automatic Speech Recognition: The dataset includes orthographic and normalized transcriptions for each audio recording, making it a useful resource for automatic speech recognition (ASR) tasks. The sentences were carefully selected to cover a wide range of phonemes in the Polish language. Text-to-Speech: The dataset contains emotional audio recordings with transcriptions, which can be valuable for developing TTS systems that produce emotionally expressive speech.

Additional Information Licensing Information The dataset is available under the Creative Commons license (CC BY-NC-SA 4.0).

Citation Information You can access the nEMO paper at arXiv. Please cite the paper when referencing the nEMO dataset as:

@misc{christop2024nemo, title={nEMO: Dataset of Emotional Speech in Polish}, author={Iwona Christop}, year={2024}, eprint={2404.06292}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Contributions Thanks to @iwonachristop for adding this dataset.
f
Additional file 7 of pyMeSHSim: an integrative python package for biomedical...
springernature.figshare.com
xlsx
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhi-Hui Luo; Meng-Wei Shi; Zhuang Yang; Hong-Yu Zhang; Zhen-Xia Chen (2023). Additional file 7 of pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms [Dataset]. http://doi.org/10.6084/m9.figshare.12511142.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12511142.v1
Dataset updated
Jun 6, 2023
Dataset provided by
figshare
Authors
Zhi-Hui Luo; Meng-Wei Shi; Zhuang Yang; Hong-Yu Zhang; Zhen-Xia Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 7 Supplementary Table 5. GWAS phenotypes parsed by Nelson’s group and pyMeSHSim, TaggerOne and DNorm. the semantic similarity between them calculated by pyMeSHSim. pyMeSHSim_Score is semantic similarity between Nelson_MeSH _ID and pyMeSHSim_MeSH_ID, taggerOne_score is semantic similarity between Nelson_MeSH _ID and TaggerOne_MeSH_ID, DNorm_score is semantic similarity between Nelson_MeSH _ID and Dnorm_MeSH_ID.

Wordle Answer Search Trends Dataset (2021–2025)

kaggle.com

Updated Jun 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 26, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ankush Kamboj

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

🔍 Hypothesis

Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

This dataset supports exploration of:

Wordle Answers
Trends for wordle answers
Correlation between wordle answer rarity and search interest

Columns

Column	Description
`date`	Date of the Wordle puzzle
`word`	Correct 5-letter Wordle answer
`game`	Wordle game number
`wordfreq_commonality`	Normalized frequency score using Python’s `wordfreq` library
`subtlex_commonality`	Normalized frequency score using SUBTLEX-US dataset
`trend_day_global`	Google search interest on the day (global, all categories)
`trend_avg_200_global`	200-day average search interest (global, all categories)
`trend_day_language`	Search interest on Wordle day (Language Resources category)
`trend_avg_200_language`	200-day average search interest (Language Resources category)

Notes: - All trend values are relative (0–100 scale, per Google Trends)

🧮 Methodology

Wordle answers were scraped from wordfinder.yourdictionary.com
Commonality scores were computed using:
- wordfreq Python library
- SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
Trend data was fetched using Google Trends API via pytrends

📊 Analysis

Can find analysis done using this data in the blog post

Pitch Audio Dataset (Surge synthesizer)
zenodo.org
tar
Updated Aug 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Turian; Joseph Turian (2021). Pitch Audio Dataset (Surge synthesizer) [Dataset]. http://doi.org/10.5281/zenodo.4677097
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4677097
Dataset updated
Aug 3, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joseph Turian; Joseph Turian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
3.4 hours of audio synthesized using the open-source Surge synthesizer, based upon 2084 presets included in the Surge package. These represent ``natural'' synthesis sounds---i.e.presets devised by humans.

We generated 4-second samples playing at velocity 64 with a note-on duration of 3 seconds. For each preset, we varied only the pitch, from MIDI 21--108, the range of a grand piano. Every sound in the dataset was RMS-level normalized using the normalize package. There was no elegant way to dedup this dataset; however only a small percentage of presets (like drums and sound effects) had no perceptual pitch variation or ordering.

We used the Surge Python API to generate this dataset.

Applications of this dataset include:

Pitch prediction

Pitch ranking within a preset

Predict a sound's preset

If you use this dataset in published researched, please cite Turian et al., "One Billion Audio Sounds from GPU-enabled Modular Synthesis", in Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), 2021:

@inproceedings{turian2021torchsynth,
title = {One Billion Audio Sounds from {GPU}-enabled Modular Synthesis},
author = {Joseph Turian and Jordie Shier and George Tzanetakis and Kirk McNally and Max Henry},
year = 2021,
month = Sep,
booktitle = {Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020)},
location = {Vienna, Austria}
}
Z
Task Scheduler Performance Survey Results
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakub Beránek (2020). Task Scheduler Performance Survey Results [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2630588
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Vojtěch Cima
Jakub Beránek
Stanislav Böhm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Task scheduler performance survey

This dataset contains results of task graph scheduler performance survey. The results are stored in the following files, which correspond to simulations performed on the elementary, irw and pegasus task graph datasets published at https://doi.org/10.5281/zenodo.2630384.

elementary-result.zip

irw-result.zip

pegasus-result.zip

The files contain compressed pandas dataframes in CSV format, it can be read with the following Python code: python import pandas as pd frame = pd.read_csv("elementary-result.zip")

Each row in the frame corresponds to a single instance of a task graph that was simulated with a specific configuration (network model, scheduler etc.). The list below summarizes the meaning of the individual columns.

graph_name - name of the benchmarked task graph

graph_set - name of the task graph dataset from which the graph originates

graph_id - unique ID of the graph

cluster_name - type of cluster used in this instance the format is x; 32x16 means 32 workers, each with 16 cores

bandwidth - network bandwidth [MiB]

netmodel - network model (simple or maxmin)

scheduler_name - name of the scheduler

imode - information mode

min_sched_interval - minimal scheduling delay [s]

sched_time - duration of each scheduler invocation [s]

time - simulated makespan of the task graph execution [s]

execution_time - real duration of all scheduler invocations [s]

total_transfer - amount of data transferred amongst workers [MiB]

The file charts.zip contains charts obtained by processing the datasets. On the X axis there is always bandwidth in [MiB/s]. There are the following files:

[DATASET]-schedulers-time - Absolute makespan produced by schedulers [seconds]

[DATASET]-schedulers-score - The same as above but normalized with respect to the best schedule (shortest makespan) for the given configuration.

[DATASET]-schedulers-transfer - Sums of transfers between all workers for a given configuration [MiB]

[DATASET]-[CLUSTER]-netmodel-time - Comparison of netmodels, absolute times [seconds]

[DATASET]-[CLUSTER]-netmodel-score - Comparison of netmodels, normalized to the average of model "simple"

[DATASET]-[CLUSTER]-netmodel-transfer - Comparison of netmodels, sum of transfered data between all workers [MiB]

[DATASET]-[CLUSTER]-schedtime-time - Comparison of MSD, absolute times [seconds]

[DATASET]-[CLUSTER]-schedtime-score - Comparison of MSD, normalized to the average of "MSD=0.0" case

[DATASET]-[CLUSTER]-imode-time - Comparison of Imodes, absolute times [seconds]

[DATASET]-[CLUSTER]-imode-score - Comparison of Imodes, normalized to the average of "exact" imode

Reproducing the results

Download and install Estee (https://github.com/It4innovations/estee)

$ git clone https://github.com/It4innovations/estee $ cd estee $ pip install .

Generate task graphs You can either use the provided script benchmarks/generate.py to generate graphs from three categories (elementary, irw and pegasus):

$ cd benchmarks $ python generate.py elementary.zip elementary $ python generate.py irw.zip irw $ python generate.py pegasus.zip pegasus

or use our task graph dataset that is provided at https://doi.org/10.5281/zenodo.2630384.

Run benchmarks To run a benchmark suite, you should prepare a JSON file describing the benchmark. The file that was used to run experiments from the paper is provided in benchmark.json. Then you can run the benchmark using this command:

$ python pbs.py compute benchmark.json

The benchmark script can be interrupted at any time (for example using Ctrl+C). When interrupted, it will store the computed results to the result file and restore the computation when launched again.

Visualizing results

$ python view.py --all

The resulting plots will appear in a folder called outputs.
Z
TCGA Glioblastoma Multiforme (GBM) Gene Expression
data.niaid.nih.gov
zenodo.org
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar (2023). TCGA Glioblastoma Multiforme (GBM) Gene Expression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8187688
Explore at:
Dataset updated
Jul 27, 2023
Dataset authored and provided by
Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset contains information about GBM, an aggressive and highly malignant brain tumor that arises from glial cells, characterized by rapid growth and infiltrative behavior. The gene expression profile was measured experimentally using the Affymetrix HT Human Genome U133a microarray platform by the Broad Institute of MIT and Harvard University cancer genomic characterization center. The Sample IDs serve as unique identifiers for each sample.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The log2(x) normalization was removed, and z-normalization was performed on the dataset using a Python script.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
ViC Dataset: IQ signal visualization for CBRS, SAS
kaggle.com
Updated Oct 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyelin Nam (2021). ViC Dataset: IQ signal visualization for CBRS, SAS [Dataset]. https://www.kaggle.com/hyelinnam/vic-dataset-iq-signal-visualization-for-cbrs/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 16, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hyelin Nam
Description
Context

ViC dataset is a collection for implementing a Dynamic Spectrum Access(DSA) system testbed in the CBRS band in the USA. This data is a DSA system which consists of a 2-tier user : Incident user: generating a chirp signal with a Radar system, Primary user: LTE-TDD signal with a CBSD base station system, and corresponds to signal waveforms in the band 3.55-3.56 GHz (Ch1), 3.56-3.57 GHz (Ch2) respectively. There are a total of 12 classes, excluding the assumption that two of the 16 cases are used by CBSD base stations, depending on the presence or absence of two users in two channels. The labels of each data have the following meanings :

0000 (0) : All off 0001 (1) : Ch2 - Radar on 0010 (2) : Ch2 - LTE on 0011 (3) : Ch2 – LTE, Radar on 0100 (4) : Ch1 – Radar on 0101 (5) : Ch1 – Radar on / Ch2 – Radar on 0110 (6) : Ch1 – Radar on /Ch2 – LTE on 0111 (7) : Ch1 – Radar on / Ch2 – LTE, Radar on 1000 (8) : Ch1 – LTE on 1001 (9) : Ch1 – LTE on / Ch2 – Radar on (X) 1010 (10) : Ch1 – LTE on / Ch2 – LTE on (X) 1011 (11) : Ch1 – LTE on / Ch2 – LTE, Radar on 1100 (12) : Ch1 – LTE, Radar on 1101 (13) : Ch1 – LTE, Radar on / Ch2 – Radar on (X) 1110 (14) : Ch1 – LTE, Radar on / Ch2 – LTE on (X) 1111 (15) : Ch1 – LTE, Radar on / Ch2 – LTE, Radar on

Content

This dataset has a total of 7 types consisting of one raw dataset expressed in two extensions, 4 processed datasets processed in different ways, and a label. Except for one of the datasets, all are Python version of numpy files, and the other is a csv file.

(Raw) The raw data is a IQ data generated from testbeds created by imitating the SAS system of CBRS in the United States. In the testbeds, the primary user was made using the LabView communication tool and the USRP antenna (Radar), and the secondary user was made by manufacturing the CBSD base station. This has both csv format and numpy format exist.

(Processed) All of these data except one are normalized to values between 0 and 255 and consist of spectrogram, scalogram, and IQ data. The other one is a spectrogram dataset which is not normalized. They are measured between 250us. In the case of spectrograms and scalograms, the figure formed at 3.56 GHz to 3.57 GHz corresponds to channel 1, and at 3.55 GHz to 3.56 GHz corresponds to channel 2. Among them, signals transmitted from the CBSD base station are output in the form of LTE-TDD signals, and signals transmitted from the Radar system are output in the form of Chirp signals.

(Label) All of the above five data share one label. This label has a numpy format.
f
FRET Simulation Dataset
figshare.com
bin
Updated Nov 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl Mayer (2017). FRET Simulation Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5573542.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5573542.v1
Dataset updated
Nov 6, 2017
Dataset provided by
figshare
Authors
Carl Mayer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datafiles contain over 3 million simulated noisy FRET spectra which can be used for validating FRET analysis approaches. The same data is formatted for both matlab and python. The matlab file can be read in with each of the components as a separate variable. The python numpy array can be read in and then converted to a dictionary containing each component using numpy.load(‘YourFilepath’).item().Matlab Variable/Python Dictionary Entry: ’Simulated Pixels’: These are simulated noisy spectra covering a range of SNR and FRET efficiencies, organized in the

following manner: Simulated_Pixels[N,Power,Efficiency,Excitation,Emission], where N are repeat simulations to calculate

statistics with (1000 simulations for every condition). Has an overall shape of (1000, 150, 11, 2, 32).

’sRET luxFRET Calibration Spectra’: These are noiseless calibration spectra organized in the following manner: sRET luxFRET Calibration Spectra[Power,Donor or Acceptor,Excitation,Emission], with an overall shape of (150, 2, 2, 32).

These spectra can also be used to calculate the normalized emission spectra and gamma parameter needed for sensorFRET, but we also included those values separately for convenience.

’Power Vector’: The vector relating the indices in the 2nd dimension of ‘Simulated Pixels’ to the simulated power used

ranging from 0.1-1000 (arbitrary units) in 150 logarithmic steps to change the SNR and provide normalized residuals in the approximate range of 0.001 to 0.1.

’Efficiency Vector’: The vector relating the indices in the 3rd dimension of ‘Simulated Pixels’ to the simulated FRET

efficiency ranging from 0 to 1 in 11 linear steps.

’Excitation Wavelength Vector’: The vector relating the indices of the 4th dimension of ‘Simulated Pixels’ to the simulated

Excitation wavelength, either 405 or 458.

’Emission Wavelength Vector’: The vector relating the indices of the 5th dimension of ‘Simulated Pixels’ to the simulated

Emission Wavelengths, ranging from 416 to 718 in 32 linear steps to match the spectral resolution of our experiments.

’Normalized Emission Spectra’:An array containing the normalized emission shapes for the Cerulean and Venus fluorophores (shape of (2, 32)). ’Gamma’: the sensorFRET gamma parameter for the Cerulean/Venus-405/458 pairing, 0.0605 (from experiment) ’Qd’: the quantum efficiency of Cerulean, 0.62.

’Qa’: the quantum efficiency of Venus, 0.57.
o
Modulation Module for HelMod-4
openaccessrepository.it
application/gzip +1
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Gervasi; M. Gervasi; S. Della torre; S. Della torre; P.g. Rancoita; P.g. Rancoita; M.j. Boschini; M.j. Boschini; G. La vacca; G. La vacca (2025). Modulation Module for HelMod-4 [Dataset]. https://www.openaccessrepository.it/records/9mdks-9c577
Explore at:
text/x-python, application/gzipAvailable download formats
Dataset updated
May 5, 2025
Dataset provided by
sdalpra
Authors
M. Gervasi; M. Gervasi; S. Della torre; S. Della torre; P.g. Rancoita; P.g. Rancoita; M.j. Boschini; M.j. Boschini; G. La vacca; G. La vacca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SDE integration with HelMod results in a quite expensive effort from the computational point of view since, to minimize the uncertainties, a huge amount of events should be integrated from Earth to the heliosphere boundary. Monte Carlo integration allows us to evaluate the normalized probability function (G) that a particle observed at Earth with rigidity R0 entered into the heliosphere with rigidity R. The convolution of the normalized probability function with the very local interstellar spectra result in the modulation of differential intensity for the time and solar distance where G was evaluated. In the present dataset, we provide the numerical output of HelMod-4 model (www.helmod.org) in the form of normalized probability histograms. The python script attached is able to convert GALPROP output (or plain text LIS file) to modulated spectrum for periods of selected experiments.

This dataset was used as part of the publications in the references.

For any information about the HelMod-4 Model, please refer to the official website.

How to install and configure

Install python (>3.0) packages

\t
astropy
\t
scypy >=0.17.0
\t
numpy >=1.10
\t
matplotlib

Download the Python OfflineModule and the HelModArchive. The archive is provided in tgz format, thus it needs to be first unpacked with the command tar -xvzf .

The archive structure:

The HelModArchives.tgz contains several directories each one with the name of a space or balloon mission. Each folder should be considered as an HelMod Archive containing the following files:

\t
ExpList.list : List of nuclei and isotopes simulated (to not modify)
\t
ExpList_Plot.list: List of nuclei available in the archive, reference and plots properties (do not modify first and second columns, the others can be updated to modify the output plots)
\t
ParameterSimulated.list: list of folders in the form RawPar_HelMod4_XX, (at least one line should start with '+', if not, please add it to first line)
\t
ParameterSimulated_DB.list: list of folders in the form RawPar_HelMod4_XX, with description
\t
Version.txt : Version notes
\t
DataTXT : experimental energy and rigidity binning used for simulations
\t
RawPar_HelMod4_00 : HelMod simulations outputs

How to use the module:

The usage of the module requires three elements:

\t
An Helmod Archive unpacked in some known folder. E.g. let's
\t
A LIS from galprop fits file OR plain text file format (hereafter called ).
\t
The label of the ion/dataset (hereafter called ) that are intended to be modulated.

The list of available in each archive may be found in the file ExpList_Plot.list or using the command-line

python3 HelMod_Module.py -a

The basic command to get the modulated spectrum is:

python3 HelMod_Module.py -a

other available options:

\t
-h help description
\t
-t use this option to specify that
\t
-p Choose a different set of parameters. The list of available parameter set names is available in the file ParameterSimulated_DB.list .
\t
--MakePlotCreate a Plot in png format.
\t
--SumAllIsotpes (can be used with GALPROP LIS inputs) evaluate the modulated spectra as the sum of the modulated isotopes spectra (note that without this option only the LIS of the isotope specified in
\t
--PrintLIS Create a file with the LIS in the format of a two-column plain text file.
\t
--SimUnit force the Output Unit of the module: use Tkin to select Kinetic Energy per Nucleon [GeV/n], use Rigi to select Rigidity [GV]. If not specified, the output is chosen accordingly to the original format of the experimental dataset.
\t
-o Use a custom name for the output file.

LIS in text format

Users can provide a txt file for LIS with the following characteristics:

\t
The file must be a text file.
\t
The file must contain two columns only: \t
\t\t
one for kinetic energy per nucleon [GeV]
\t\t
the second for the LIS flux [ (m² s sr GeV)^-1].
\t
\t
\t
The file may contain comments. Line starting with '#' character will be ignored.
t
Supplementary data - stress-induced changes in magnetite: insights from a...
service.tib.eu
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Supplementary data - stress-induced changes in magnetite: insights from a numerical analysis of the verwey transition - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-hbwbdgigwbcvtfyc
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Abstract: The dataset contains measurements of magnetic susceptibility in dependence of temperature of shocked magnetite and of a natural magnetite single crystal before and after manual crushing. A python code for evaluation of low-temperature susceptibility curves is included. The data are supplementary to: Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International TechnicalRemarks: The data set contains k-T curves of - Initial magnetite ore from Sydvaranger mine (Norway) - the same ore after shock at 3, 5, 10, 20 and 30 GPa under laboratory conditions and after subsequent heating to 973 K -Natural magnetite single crystal (initial and after manual crushing) The data set contains a python code for evaluation of normalized low-temperature k-T curves. Experimental conditions are described in [1]. The approach for k-T curve evaluation is described in [2] [1]: Kontny, A., Reznik, B., Boubnov, A., Göttlicher, J. and Steininger, R., 2018. Postshock Thermally Induced Transformations in Experimentally Shocked Magnetite, Geochemistry, Geophysics, Geosystems, Vol. 19, 3, pp. 921–931, doi:10.1002/2017GC007331. [2] Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Walton, Sam D. (2022). Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6835136

Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set

Explore at:

Dataset updated

Jul 15, 2022

Dataset provided by

Murphy, Kyle R.
Walton, Sam D.

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.

Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.

The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.

The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.

Clear search

Close search

Google apps

Main menu

Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set

AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and...

nEMO Dataset

Additional file 7 of pyMeSHSim: an integrative python package for biomedical...

Wordle Answer Search Trends Dataset (2021–2025)

🔍 Hypothesis

Columns

🧮 Methodology

📊 Analysis

Pitch Audio Dataset (Surge synthesizer)

Task Scheduler Performance Survey Results

TCGA Glioblastoma Multiforme (GBM) Gene Expression

ViC Dataset: IQ signal visualization for CBRS, SAS

Context

Content

FRET Simulation Dataset

Modulation Module for HelMod-4

Supplementary data - stress-induced changes in magnetite: insights from a...

Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data SetSee More Versions

Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set