12 datasets found
  1. Z

    Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set

    • data.niaid.nih.gov
    Updated Jul 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walton, Sam D. (2022). Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6835136
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Murphy, Kyle R.
    Walton, Sam D.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.

    Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.

    The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.

    The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.

  2. Z

    AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodríguez Miret, Jan (2024). AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and Normalization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14210424
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Krallinger, Martin
    Marsol Torrent, Sergi
    Rodríguez Miret, Jan
    Rodríguez Ortega, Miguel
    Farré-Maduell, Eulàlia
    Becerra-Tomé, Alberto
    Lima-López, Salvador
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises a professions gazetteer generated with automatically extracted terminology from the Mesinesp2 corpus, a manually annotated corpus in which domain experts have labeled a set of scientific literature, clinical trials, and patent abstracts, as well as clinical case reports.

    A silver gazetteer for mention classification and normalization is created combining the predictions of automatic Named Entity Recognition models and normalization using Entity Linking to three controlled vocabularies SNOMED CT, NCBI and ESCO. The sources are 265,025 different documents, where 249,538 correspond to MESINESP2 Corpora and 15,487 to clinical cases from open clinical journals. From them, 5,682,000 mentions are extracted and 4,909,966 (86.42%) are normalized to any of the ontologies: SNOMED CT (4,909,966) for diseases, symptoms, drugs, locations, occupations, procedures and species; ESCO (215,140) for occupations; and NCBI (1,469,256) for species.

    The repository contains a .tsv file with the following columns:

    filenameid: A unique identifier combining the file name and mention span within the text. This ensures each extracted mention is uniquely traceable. Example: biblio-1000005#239#256 refers to a mention spanning characters 239–256 in the file with the name biblio-1000005.

    span: The specific text span (mention) extracted from the document, representing a term or phrase identified in the dataset. Example: centro oncológico.

    source: The origin of the document, indicating the corpus from which the mention was extracted. Possible values: mesinesp2, clinical_cases.

    filename: The name of the file from which the mention was extracted. Example: biblio-1000005.

    mention_class: Categories or semantic tags assigned to the mention, describing its type or context in the text. Example: ['ENFERMEDAD', 'SINTOMA'].

    codes_esco: The normalized ontology codes from the European Skills, Competences, Qualifications, and Occupations (ESCO) vocabulary for the identified mention (if applicable). This field may be empty if no ESCO mapping exists. Example: 30629002.

    terms_esco: The human-readable terms from the ESCO ontology corresponding to the codes_esco. Example: ['responsable de recursos', 'director de recursos', 'directora de recursos'].

    codes_ncbi: The normalized ontology codes from the NCBI Taxonomy vocabulary for species (if applicable). This field may be empty if no NCBI mapping exists.

    terms_ncbi: The human-readable terms from the NCBI Taxonomy vocabulary corresponding to the codes_ncbi. Example: ['Lacandoniaceae', 'Pandanaceae R.Br., 1810', 'Pandanaceae', 'Familia'].

    codes_sct: The normalized ontology codes from SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) vocabulary for diseases, symptoms, drugs, locations, occupations, procedures, and species (if applicable). Example: 22232009.

    terms_sct: The human-readable terms from the SNOMED CT ontology corresponding to the codes_sct. Example: ['adjudicador de regulaciones del seguro nacional'].

    sct_sem_tag: The semantic category tag assigned by SNOMED CT to describe the general classification of the mention. Example: environment.

    Suggestion: If you load the dataset using python, it is recommended to read the columns containing lists as follows

    import ast

    df["mention_class"] = df["mention_class"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)

    License

    This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). This means you are free to:

    Share: Copy and redistribute the material in any medium or format.

    Adapt: Remix, transform, and build upon the material for any purpose, even commercially.

    Attribution Requirement: Please credit the dataset creators appropriately, provide a link to the license, and indicate if changes were made.

    Contact

    If you have any questions or suggestions, please contact us at:

    Martin Krallinger ()

    Additional resources and corpora

    If you are interested, you might want to check out these corpora and resources:

    MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)

    MEDDOPROF corpus

    Codes Reference List (for MEDDOPROF-NORM)

    Annotation Guidelines

    Occupations Gazetteer

  3. P

    nEMO Dataset

    • paperswithcode.com
    Updated Apr 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iwona Christop (2024). nEMO Dataset [Dataset]. https://paperswithcode.com/dataset/nemo-1
    Explore at:
    Dataset updated
    Apr 8, 2024
    Authors
    Iwona Christop
    Description

    Overview nEMO is a simulated dataset of emotional speech in the Polish language. The corpus contains over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language. The corpus is available for free under the Creative Commons license (CC BY-NC-SA 4.0).

    The dataset is available on Hugging Face and GitHub.

    Data Fields

    file_id - filename, i.e. {speaker_id}_{emotion}_{sentence_id},

    audio (audio) - dictionary containing audio array, path and sampling rate (available when accessed via datasets library),

    emotion - label corresponding to emotional state,

    raw_text - original (orthographic) transcription of the audio,

    normalized_text - normalized transcription of the audio,

    speaker_id - id of speaker,

    gender - gender of the speaker,

    age - age of the speaker.

    Usage The nEMO dataset can be loaded and processed using the datasets library:

    from datasets import load_dataset
    
    nemo = load_dataset("amu-cai/nEMO", split="train")
    

    To work with the nEMO dataset on GitHub, you may clone the repository and access the files directly within the samples folder. Corresponding metadata can be found in the data.tsv file.

    The nEMO dataset is provided as a whole, without predefined training and test splits. This allows researchers and developers flexibility in creating their splits based on the specific needs.

    Supported Tasks

    Audio classification: This dataset was mainly created for the task of speech emotion recognition. Each recording is labeled with one of six emotional states (anger, fear, happiness, sadness, surprised, and neutral). Additionally, each sample is labeled with speaker id and speaker gender. Because of that, the dataset can also be used for different audio classification tasks. Automatic Speech Recognition: The dataset includes orthographic and normalized transcriptions for each audio recording, making it a useful resource for automatic speech recognition (ASR) tasks. The sentences were carefully selected to cover a wide range of phonemes in the Polish language. Text-to-Speech: The dataset contains emotional audio recordings with transcriptions, which can be valuable for developing TTS systems that produce emotionally expressive speech.

    Additional Information Licensing Information The dataset is available under the Creative Commons license (CC BY-NC-SA 4.0).

    Citation Information You can access the nEMO paper at arXiv. Please cite the paper when referencing the nEMO dataset as:

    @misc{christop2024nemo, title={nEMO: Dataset of Emotional Speech in Polish}, author={Iwona Christop}, year={2024}, eprint={2404.06292}, archivePrefix={arXiv}, primaryClass={cs.CL} }

    Contributions Thanks to @iwonachristop for adding this dataset.

  4. f

    Additional file 7 of pyMeSHSim: an integrative python package for biomedical...

    • springernature.figshare.com
    xlsx
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhi-Hui Luo; Meng-Wei Shi; Zhuang Yang; Hong-Yu Zhang; Zhen-Xia Chen (2023). Additional file 7 of pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms [Dataset]. http://doi.org/10.6084/m9.figshare.12511142.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    figshare
    Authors
    Zhi-Hui Luo; Meng-Wei Shi; Zhuang Yang; Hong-Yu Zhang; Zhen-Xia Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 7 Supplementary Table 5. GWAS phenotypes parsed by Nelson’s group and pyMeSHSim, TaggerOne and DNorm. the semantic similarity between them calculated by pyMeSHSim. pyMeSHSim_Score is semantic similarity between Nelson_MeSH _ID and pyMeSHSim_MeSH_ID, taggerOne_score is semantic similarity between Nelson_MeSH _ID and TaggerOne_MeSH_ID, DNorm_score is semantic similarity between Nelson_MeSH _ID and Dnorm_MeSH_ID.

  5. Wordle Answer Search Trends Dataset (2021–2025)

    • kaggle.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ankush Kamboj
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

    It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

    🔍 Hypothesis

    Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

    This dataset supports exploration of:

    • Wordle Answers
    • Trends for wordle answers
    • Correlation between wordle answer rarity and search interest

    Columns

    ColumnDescription
    dateDate of the Wordle puzzle
    wordCorrect 5-letter Wordle answer
    gameWordle game number
    wordfreq_commonalityNormalized frequency score using Python’s wordfreq library
    subtlex_commonalityNormalized frequency score using SUBTLEX-US dataset
    trend_day_globalGoogle search interest on the day (global, all categories)
    trend_avg_200_global200-day average search interest (global, all categories)
    trend_day_languageSearch interest on Wordle day (Language Resources category)
    trend_avg_200_language200-day average search interest (Language Resources category)

    Notes: - All trend values are relative (0–100 scale, per Google Trends)

    🧮 Methodology

    • Wordle answers were scraped from wordfinder.yourdictionary.com
    • Commonality scores were computed using:
      • wordfreq Python library
      • SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
    • Trend data was fetched using Google Trends API via pytrends

    📊 Analysis

    Can find analysis done using this data in the blog post

  6. Pitch Audio Dataset (Surge synthesizer)

    • zenodo.org
    tar
    Updated Aug 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Turian; Joseph Turian (2021). Pitch Audio Dataset (Surge synthesizer) [Dataset]. http://doi.org/10.5281/zenodo.4677097
    Explore at:
    tarAvailable download formats
    Dataset updated
    Aug 3, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph Turian; Joseph Turian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    3.4 hours of audio synthesized using the open-source Surge synthesizer, based upon 2084 presets included in the Surge package. These represent ``natural'' synthesis sounds---i.e.presets devised by humans.

    We generated 4-second samples playing at velocity 64 with a note-on duration of 3 seconds. For each preset, we varied only the pitch, from MIDI 21--108, the range of a grand piano. Every sound in the dataset was RMS-level normalized using the normalize package. There was no elegant way to dedup this dataset; however only a small percentage of presets (like drums and sound effects) had no perceptual pitch variation or ordering.

    We used the Surge Python API to generate this dataset.

    Applications of this dataset include:

    • Pitch prediction
    • Pitch ranking within a preset
    • Predict a sound's preset

    If you use this dataset in published researched, please cite Turian et al., "One Billion Audio Sounds from GPU-enabled Modular Synthesis", in Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), 2021:

    @inproceedings{turian2021torchsynth,
    title = {One Billion Audio Sounds from {GPU}-enabled Modular Synthesis},
    author = {Joseph Turian and Jordie Shier and George Tzanetakis and Kirk McNally and Max Henry},
    year = 2021,
    month = Sep,
    booktitle = {Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020)},
    location = {Vienna, Austria}
    }

  7. Z

    Task Scheduler Performance Survey Results

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakub Beránek (2020). Task Scheduler Performance Survey Results [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2630588
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Vojtěch Cima
    Jakub Beránek
    Stanislav Böhm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Task scheduler performance survey

    This dataset contains results of task graph scheduler performance survey. The results are stored in the following files, which correspond to simulations performed on the elementary, irw and pegasus task graph datasets published at https://doi.org/10.5281/zenodo.2630384.

    elementary-result.zip

    irw-result.zip

    pegasus-result.zip

    The files contain compressed pandas dataframes in CSV format, it can be read with the following Python code: python import pandas as pd frame = pd.read_csv("elementary-result.zip")

    Each row in the frame corresponds to a single instance of a task graph that was simulated with a specific configuration (network model, scheduler etc.). The list below summarizes the meaning of the individual columns.

    graph_name - name of the benchmarked task graph

    graph_set - name of the task graph dataset from which the graph originates

    graph_id - unique ID of the graph

    cluster_name - type of cluster used in this instance the format is x; 32x16 means 32 workers, each with 16 cores

    bandwidth - network bandwidth [MiB]

    netmodel - network model (simple or maxmin)

    scheduler_name - name of the scheduler

    imode - information mode

    min_sched_interval - minimal scheduling delay [s]

    sched_time - duration of each scheduler invocation [s]

    time - simulated makespan of the task graph execution [s]

    execution_time - real duration of all scheduler invocations [s]

    total_transfer - amount of data transferred amongst workers [MiB]

    The file charts.zip contains charts obtained by processing the datasets. On the X axis there is always bandwidth in [MiB/s]. There are the following files:

    [DATASET]-schedulers-time - Absolute makespan produced by schedulers [seconds]

    [DATASET]-schedulers-score - The same as above but normalized with respect to the best schedule (shortest makespan) for the given configuration.

    [DATASET]-schedulers-transfer - Sums of transfers between all workers for a given configuration [MiB]

    [DATASET]-[CLUSTER]-netmodel-time - Comparison of netmodels, absolute times [seconds]

    [DATASET]-[CLUSTER]-netmodel-score - Comparison of netmodels, normalized to the average of model "simple"

    [DATASET]-[CLUSTER]-netmodel-transfer - Comparison of netmodels, sum of transfered data between all workers [MiB]

    [DATASET]-[CLUSTER]-schedtime-time - Comparison of MSD, absolute times [seconds]

    [DATASET]-[CLUSTER]-schedtime-score - Comparison of MSD, normalized to the average of "MSD=0.0" case

    [DATASET]-[CLUSTER]-imode-time - Comparison of Imodes, absolute times [seconds]

    [DATASET]-[CLUSTER]-imode-score - Comparison of Imodes, normalized to the average of "exact" imode

    Reproducing the results

    1. Download and install Estee (https://github.com/It4innovations/estee)

    $ git clone https://github.com/It4innovations/estee $ cd estee $ pip install .

    1. Generate task graphs You can either use the provided script benchmarks/generate.py to generate graphs from three categories (elementary, irw and pegasus):

    $ cd benchmarks $ python generate.py elementary.zip elementary $ python generate.py irw.zip irw $ python generate.py pegasus.zip pegasus

    or use our task graph dataset that is provided at https://doi.org/10.5281/zenodo.2630384.

    1. Run benchmarks To run a benchmark suite, you should prepare a JSON file describing the benchmark. The file that was used to run experiments from the paper is provided in benchmark.json. Then you can run the benchmark using this command:

    $ python pbs.py compute benchmark.json

    The benchmark script can be interrupted at any time (for example using Ctrl+C). When interrupted, it will store the computed results to the result file and restore the computation when launched again.

    1. Visualizing results

    $ python view.py --all

    The resulting plots will appear in a folder called outputs.

  8. Z

    TCGA Glioblastoma Multiforme (GBM) Gene Expression

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Baskiyar (2023). TCGA Glioblastoma Multiforme (GBM) Gene Expression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8187688
    Explore at:
    Dataset updated
    Jul 27, 2023
    Dataset authored and provided by
    Swati Baskiyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset contains information about GBM, an aggressive and highly malignant brain tumor that arises from glial cells, characterized by rapid growth and infiltrative behavior. The gene expression profile was measured experimentally using the Affymetrix HT Human Genome U133a microarray platform by the Broad Institute of MIT and Harvard University cancer genomic characterization center. The Sample IDs serve as unique identifiers for each sample.

    Inspiration:

    This dataset was uploaded to UBRITE for GTKB project.

    Instruction:

    The log2(x) normalization was removed, and z-normalization was performed on the dataset using a Python script.

    Acknowledgments:

    Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

    The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

    U-BRITE last update: 07/13/2023

  9. ViC Dataset: IQ signal visualization for CBRS, SAS

    • kaggle.com
    Updated Oct 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyelin Nam (2021). ViC Dataset: IQ signal visualization for CBRS, SAS [Dataset]. https://www.kaggle.com/hyelinnam/vic-dataset-iq-signal-visualization-for-cbrs/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hyelin Nam
    Description

    Context

    ViC dataset is a collection for implementing a Dynamic Spectrum Access(DSA) system testbed in the CBRS band in the USA. This data is a DSA system which consists of a 2-tier user : Incident user: generating a chirp signal with a Radar system, Primary user: LTE-TDD signal with a CBSD base station system, and corresponds to signal waveforms in the band 3.55-3.56 GHz (Ch1), 3.56-3.57 GHz (Ch2) respectively. There are a total of 12 classes, excluding the assumption that two of the 16 cases are used by CBSD base stations, depending on the presence or absence of two users in two channels. The labels of each data have the following meanings :

    0000 (0) : All off 0001 (1) : Ch2 - Radar on 0010 (2) : Ch2 - LTE on 0011 (3) : Ch2 – LTE, Radar on 0100 (4) : Ch1 – Radar on 0101 (5) : Ch1 – Radar on / Ch2 – Radar on 0110 (6) : Ch1 – Radar on /Ch2 – LTE on 0111 (7) : Ch1 – Radar on / Ch2 – LTE, Radar on 1000 (8) : Ch1 – LTE on 1001 (9) : Ch1 – LTE on / Ch2 – Radar on (X) 1010 (10) : Ch1 – LTE on / Ch2 – LTE on (X) 1011 (11) : Ch1 – LTE on / Ch2 – LTE, Radar on 1100 (12) : Ch1 – LTE, Radar on 1101 (13) : Ch1 – LTE, Radar on / Ch2 – Radar on (X) 1110 (14) : Ch1 – LTE, Radar on / Ch2 – LTE on (X) 1111 (15) : Ch1 – LTE, Radar on / Ch2 – LTE, Radar on

    Content

    This dataset has a total of 7 types consisting of one raw dataset expressed in two extensions, 4 processed datasets processed in different ways, and a label. Except for one of the datasets, all are Python version of numpy files, and the other is a csv file.

    (Raw) The raw data is a IQ data generated from testbeds created by imitating the SAS system of CBRS in the United States. In the testbeds, the primary user was made using the LabView communication tool and the USRP antenna (Radar), and the secondary user was made by manufacturing the CBSD base station. This has both csv format and numpy format exist.

    (Processed) All of these data except one are normalized to values between 0 and 255 and consist of spectrogram, scalogram, and IQ data. The other one is a spectrogram dataset which is not normalized. They are measured between 250us. In the case of spectrograms and scalograms, the figure formed at 3.56 GHz to 3.57 GHz corresponds to channel 1, and at 3.55 GHz to 3.56 GHz corresponds to channel 2. Among them, signals transmitted from the CBSD base station are output in the form of LTE-TDD signals, and signals transmitted from the Radar system are output in the form of Chirp signals.

    (Label) All of the above five data share one label. This label has a numpy format.

  10. f

    FRET Simulation Dataset

    • figshare.com
    bin
    Updated Nov 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl Mayer (2017). FRET Simulation Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5573542.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 6, 2017
    Dataset provided by
    figshare
    Authors
    Carl Mayer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datafiles contain over 3 million simulated noisy FRET spectra which can be used for validating FRET analysis approaches. The same data is formatted for both matlab and python. The matlab file can be read in with each of the components as a separate variable. The python numpy array can be read in and then converted to a dictionary containing each component using numpy.load(‘YourFilepath’).item().Matlab Variable/Python Dictionary Entry: ’Simulated Pixels’: These are simulated noisy spectra covering a range of SNR and FRET efficiencies, organized in the

            following manner: Simulated_Pixels[N,Power,Efficiency,Excitation,Emission], where N are repeat simulations to calculate
    

    statistics with (1000 simulations for every condition). Has an overall shape of (1000, 150, 11, 2, 32).

            ’sRET luxFRET Calibration Spectra’: These are noiseless calibration spectra organized in the following manner: sRET luxFRET Calibration Spectra[Power,Donor or Acceptor,Excitation,Emission], with an overall shape of (150, 2, 2, 32).
    

    These spectra can also be used to calculate the normalized emission spectra and gamma parameter needed for sensorFRET, but we also included those values separately for convenience.

            ’Power Vector’: The vector relating the indices in the 2nd dimension of ‘Simulated Pixels’ to the simulated power used
    

    ranging from 0.1-1000 (arbitrary units) in 150 logarithmic steps to change the SNR and provide normalized residuals in the approximate range of 0.001 to 0.1.

            ’Efficiency Vector’: The vector relating the indices in the 3rd dimension of ‘Simulated Pixels’ to the simulated FRET
    

    efficiency ranging from 0 to 1 in 11 linear steps.

            ’Excitation Wavelength Vector’: The vector relating the indices of the 4th dimension of ‘Simulated Pixels’ to the simulated
    

    Excitation wavelength, either 405 or 458.

            ’Emission Wavelength Vector’: The vector relating the indices of the 5th dimension of ‘Simulated Pixels’ to the simulated
    

    Emission Wavelengths, ranging from 416 to 718 in 32 linear steps to match the spectral resolution of our experiments.

            ’Normalized Emission Spectra’:An array containing the normalized emission shapes for the Cerulean and Venus fluorophores (shape of (2, 32)).
    
    
    
    
    
    
            ’Gamma’: the sensorFRET gamma parameter for the Cerulean/Venus-405/458 pairing, 0.0605 (from experiment) ’Qd’: the quantum efficiency of Cerulean, 0.62.
    

    ’Qa’: the quantum efficiency of Venus, 0.57.

  11. o

    Modulation Module for HelMod-4

    • openaccessrepository.it
    application/gzip +1
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Gervasi; M. Gervasi; S. Della torre; S. Della torre; P.g. Rancoita; P.g. Rancoita; M.j. Boschini; M.j. Boschini; G. La vacca; G. La vacca (2025). Modulation Module for HelMod-4 [Dataset]. https://www.openaccessrepository.it/records/9mdks-9c577
    Explore at:
    text/x-python, application/gzipAvailable download formats
    Dataset updated
    May 5, 2025
    Dataset provided by
    sdalpra
    Authors
    M. Gervasi; M. Gervasi; S. Della torre; S. Della torre; P.g. Rancoita; P.g. Rancoita; M.j. Boschini; M.j. Boschini; G. La vacca; G. La vacca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SDE integration with HelMod results in a quite expensive effort from the computational point of view since, to minimize the uncertainties, a huge amount of events should be integrated from Earth to the heliosphere boundary. Monte Carlo integration allows us to evaluate the normalized probability function (G) that a particle observed at Earth with rigidity R0 entered into the heliosphere with rigidity R. The convolution of the normalized probability function with the very local interstellar spectra result in the modulation of differential intensity for the time and solar distance where G was evaluated. In the present dataset, we provide the numerical output of HelMod-4 model (www.helmod.org) in the form of normalized probability histograms. The python script attached is able to convert GALPROP output (or plain text LIS file) to modulated spectrum for periods of selected experiments.

    This dataset was used as part of the publications in the references.

    For any information about the HelMod-4 Model, please refer to the official website.

    How to install and configure

    Install python (>3.0) packages

    Download the Python OfflineModule and the HelModArchive. The archive is provided in tgz format, thus it needs to be first unpacked with the command tar -xvzf .

    The archive structure:

    The HelModArchives.tgz contains several directories each one with the name of a space or balloon mission. Each folder should be considered as an HelMod Archive containing the following files:

      \t
    • ExpList.list : List of nuclei and isotopes simulated (to not modify)
    • \t
    • ExpList_Plot.list: List of nuclei available in the archive, reference and plots properties (do not modify first and second columns, the others can be updated to modify the output plots)
    • \t
    • ParameterSimulated.list: list of folders in the form RawPar_HelMod4_XX, (at least one line should start with '+', if not, please add it to first line)
    • \t
    • ParameterSimulated_DB.list: list of folders in the form RawPar_HelMod4_XX, with description
    • \t
    • Version.txt : Version notes
    • \t
    • DataTXT : experimental energy and rigidity binning used for simulations
    • \t
    • RawPar_HelMod4_00 : HelMod simulations outputs

    How to use the module:

    The usage of the module requires three elements:

      \t
    • An Helmod Archive unpacked in some known folder. E.g. let's
    • \t
    • A LIS from galprop fits file OR plain text file format (hereafter called ).
    • \t
    • The label of the ion/dataset (hereafter called ) that are intended to be modulated.

    The list of available in each archive may be found in the file ExpList_Plot.list or using the command-line

    python3 HelMod_Module.py -a 

    The basic command to get the modulated spectrum is:

    python3 HelMod_Module.py -a 

    other available options:

      \t
    • -h help description
    • \t
    • -t use this option to specify that
    • \t
    • -p Choose a different set of parameters. The list of available parameter set names is available in the file ParameterSimulated_DB.list .
    • \t
    • --MakePlotCreate a Plot in png format.
    • \t
    • --SumAllIsotpes (can be used with GALPROP LIS inputs) evaluate the modulated spectra as the sum of the modulated isotopes spectra (note that without this option only the LIS of the isotope specified in
    • \t
    • --PrintLIS Create a file with the LIS in the format of a two-column plain text file.
    • \t
    • --SimUnit force the Output Unit of the module: use Tkin to select Kinetic Energy per Nucleon [GeV/n], use Rigi to select Rigidity [GV]. If not specified, the output is chosen accordingly to the original format of the experimental dataset.
    • \t
    • -o Use a custom name for the output file.

    LIS in text format

    Users can provide a txt file for LIS with the following characteristics:

      \t
    • The file must be a text file.
    • \t
    • The file must contain two columns only: \t
        \t\t
      1. one for kinetic energy per nucleon [GeV]
      2. \t\t
      3. the second for the LIS flux [ (m2 s sr GeV)-1].
      4. \t
      \t
    • \t
    • The file may contain comments. Line starting with '#' character will be ignored.
  12. t

    Supplementary data - stress-induced changes in magnetite: insights from a...

    • service.tib.eu
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Supplementary data - stress-induced changes in magnetite: insights from a numerical analysis of the verwey transition - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-hbwbdgigwbcvtfyc
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Abstract: The dataset contains measurements of magnetic susceptibility in dependence of temperature of shocked magnetite and of a natural magnetite single crystal before and after manual crushing. A python code for evaluation of low-temperature susceptibility curves is included. The data are supplementary to: Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International TechnicalRemarks: The data set contains k-T curves of - Initial magnetite ore from Sydvaranger mine (Norway) - the same ore after shock at 3, 5, 10, 20 and 30 GPa under laboratory conditions and after subsequent heating to 973 K -Natural magnetite single crystal (initial and after manual crushing) The data set contains a python code for evaluation of normalized low-temperature k-T curves. Experimental conditions are described in [1]. The approach for k-T curve evaluation is described in [2] [1]: Kontny, A., Reznik, B., Boubnov, A., Göttlicher, J. and Steininger, R., 2018. Postshock Thermally Induced Transformations in Experimentally Shocked Magnetite, Geochemistry, Geophysics, Geosystems, Vol. 19, 3, pp. 921–931, doi:10.1002/2017GC007331. [2] Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Walton, Sam D. (2022). Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6835136

Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set

Explore at:
Dataset updated
Jul 15, 2022
Dataset provided by
Murphy, Kyle R.
Walton, Sam D.
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.

Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.

The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.

The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.

Search
Clear search
Close search
Google apps
Main menu