Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.
Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.
The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.
The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises a professions gazetteer generated with automatically extracted terminology from the Mesinesp2 corpus, a manually annotated corpus in which domain experts have labeled a set of scientific literature, clinical trials, and patent abstracts, as well as clinical case reports.
A silver gazetteer for mention classification and normalization is created combining the predictions of automatic Named Entity Recognition models and normalization using Entity Linking to three controlled vocabularies SNOMED CT, NCBI and ESCO. The sources are 265,025 different documents, where 249,538 correspond to MESINESP2 Corpora and 15,487 to clinical cases from open clinical journals. From them, 5,682,000 mentions are extracted and 4,909,966 (86.42%) are normalized to any of the ontologies: SNOMED CT (4,909,966) for diseases, symptoms, drugs, locations, occupations, procedures and species; ESCO (215,140) for occupations; and NCBI (1,469,256) for species.
The repository contains a .tsv file with the following columns:
filenameid: A unique identifier combining the file name and mention span within the text. This ensures each extracted mention is uniquely traceable. Example: biblio-1000005#239#256 refers to a mention spanning characters 239–256 in the file with the name biblio-1000005.
span: The specific text span (mention) extracted from the document, representing a term or phrase identified in the dataset. Example: centro oncológico.
source: The origin of the document, indicating the corpus from which the mention was extracted. Possible values: mesinesp2, clinical_cases.
filename: The name of the file from which the mention was extracted. Example: biblio-1000005.
mention_class: Categories or semantic tags assigned to the mention, describing its type or context in the text. Example: ['ENFERMEDAD', 'SINTOMA'].
codes_esco: The normalized ontology codes from the European Skills, Competences, Qualifications, and Occupations (ESCO) vocabulary for the identified mention (if applicable). This field may be empty if no ESCO mapping exists. Example: 30629002.
terms_esco: The human-readable terms from the ESCO ontology corresponding to the codes_esco. Example: ['responsable de recursos', 'director de recursos', 'directora de recursos'].
codes_ncbi: The normalized ontology codes from the NCBI Taxonomy vocabulary for species (if applicable). This field may be empty if no NCBI mapping exists.
terms_ncbi: The human-readable terms from the NCBI Taxonomy vocabulary corresponding to the codes_ncbi. Example: ['Lacandoniaceae', 'Pandanaceae R.Br., 1810', 'Pandanaceae', 'Familia'].
codes_sct: The normalized ontology codes from SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) vocabulary for diseases, symptoms, drugs, locations, occupations, procedures, and species (if applicable). Example: 22232009.
terms_sct: The human-readable terms from the SNOMED CT ontology corresponding to the codes_sct. Example: ['adjudicador de regulaciones del seguro nacional'].
sct_sem_tag: The semantic category tag assigned by SNOMED CT to describe the general classification of the mention. Example: environment.
Suggestion: If you load the dataset using python, it is recommended to read the columns containing lists as follows
import ast
df["mention_class"] = df["mention_class"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
License
This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). This means you are free to:
Share: Copy and redistribute the material in any medium or format.
Adapt: Remix, transform, and build upon the material for any purpose, even commercially.
Attribution Requirement: Please credit the dataset creators appropriately, provide a link to the license, and indicate if changes were made.
Contact
If you have any questions or suggestions, please contact us at:
Martin Krallinger ()
Additional resources and corpora
If you are interested, you might want to check out these corpora and resources:
MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)
MEDDOPROF corpus
Codes Reference List (for MEDDOPROF-NORM)
Annotation Guidelines
Occupations Gazetteer
Overview nEMO is a simulated dataset of emotional speech in the Polish language. The corpus contains over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language. The corpus is available for free under the Creative Commons license (CC BY-NC-SA 4.0).
The dataset is available on Hugging Face and GitHub.
Data Fields
file_id - filename, i.e. {speaker_id}_{emotion}_{sentence_id},
audio (audio) - dictionary containing audio array, path and sampling rate (available when accessed via datasets library),
emotion - label corresponding to emotional state,
raw_text - original (orthographic) transcription of the audio,
normalized_text - normalized transcription of the audio,
speaker_id - id of speaker,
gender - gender of the speaker,
age - age of the speaker.
Usage The nEMO dataset can be loaded and processed using the datasets library:
from datasets import load_dataset
nemo = load_dataset("amu-cai/nEMO", split="train")
To work with the nEMO dataset on GitHub, you may clone the repository and access the files directly within the samples folder. Corresponding metadata can be found in the data.tsv file.
The nEMO dataset is provided as a whole, without predefined training and test splits. This allows researchers and developers flexibility in creating their splits based on the specific needs.
Supported Tasks
Audio classification: This dataset was mainly created for the task of speech emotion recognition. Each recording is labeled with one of six emotional states (anger, fear, happiness, sadness, surprised, and neutral). Additionally, each sample is labeled with speaker id and speaker gender. Because of that, the dataset can also be used for different audio classification tasks. Automatic Speech Recognition: The dataset includes orthographic and normalized transcriptions for each audio recording, making it a useful resource for automatic speech recognition (ASR) tasks. The sentences were carefully selected to cover a wide range of phonemes in the Polish language. Text-to-Speech: The dataset contains emotional audio recordings with transcriptions, which can be valuable for developing TTS systems that produce emotionally expressive speech.
Additional Information Licensing Information The dataset is available under the Creative Commons license (CC BY-NC-SA 4.0).
Citation Information You can access the nEMO paper at arXiv. Please cite the paper when referencing the nEMO dataset as:
@misc{christop2024nemo, title={nEMO: Dataset of Emotional Speech in Polish}, author={Iwona Christop}, year={2024}, eprint={2404.06292}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Contributions Thanks to @iwonachristop for adding this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 7 Supplementary Table 5. GWAS phenotypes parsed by Nelson’s group and pyMeSHSim, TaggerOne and DNorm. the semantic similarity between them calculated by pyMeSHSim. pyMeSHSim_Score is semantic similarity between Nelson_MeSH _ID and pyMeSHSim_MeSH_ID, taggerOne_score is semantic similarity between Nelson_MeSH _ID and TaggerOne_MeSH_ID, DNorm_score is semantic similarity between Nelson_MeSH _ID and Dnorm_MeSH_ID.
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.
It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.
Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.
This dataset supports exploration of:
Column | Description |
---|---|
date | Date of the Wordle puzzle |
word | Correct 5-letter Wordle answer |
game | Wordle game number |
wordfreq_commonality | Normalized frequency score using Python’s wordfreq library |
subtlex_commonality | Normalized frequency score using SUBTLEX-US dataset |
trend_day_global | Google search interest on the day (global, all categories) |
trend_avg_200_global | 200-day average search interest (global, all categories) |
trend_day_language | Search interest on Wordle day (Language Resources category) |
trend_avg_200_language | 200-day average search interest (Language Resources category) |
Notes: - All trend values are relative (0–100 scale, per Google Trends)
wordfreq
Python librarypytrends
Can find analysis done using this data in the blog post
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
3.4 hours of audio synthesized using the open-source Surge synthesizer, based upon 2084 presets included in the Surge package. These represent ``natural'' synthesis sounds---i.e.presets devised by humans.
We generated 4-second samples playing at velocity 64 with a note-on duration of 3 seconds. For each preset, we varied only the pitch, from MIDI 21--108, the range of a grand piano. Every sound in the dataset was RMS-level normalized using the normalize package. There was no elegant way to dedup this dataset; however only a small percentage of presets (like drums and sound effects) had no perceptual pitch variation or ordering.
We used the Surge Python API to generate this dataset.
Applications of this dataset include:
If you use this dataset in published researched, please cite Turian et al., "One Billion Audio Sounds from GPU-enabled Modular Synthesis", in Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), 2021:
@inproceedings{turian2021torchsynth,
title = {One Billion Audio Sounds from {GPU}-enabled Modular Synthesis},
author = {Joseph Turian and Jordie Shier and George Tzanetakis and Kirk McNally and Max Henry},
year = 2021,
month = Sep,
booktitle = {Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020)},
location = {Vienna, Austria}
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Task scheduler performance survey
This dataset contains results of task graph scheduler performance survey.
The results are stored in the following files, which correspond to simulations performed on
the elementary
, irw
and pegasus
task graph datasets published at https://doi.org/10.5281/zenodo.2630384.
elementary-result.zip
irw-result.zip
pegasus-result.zip
The files contain compressed pandas dataframes in CSV format, it can be read with the following Python code:
python
import pandas as pd
frame = pd.read_csv("elementary-result.zip")
Each row in the frame corresponds to a single instance of a task graph that was simulated with a specific configuration (network model, scheduler etc.). The list below summarizes the meaning of the individual columns.
graph_name - name of the benchmarked task graph
graph_set - name of the task graph dataset from which the graph originates
graph_id - unique ID of the graph
cluster_name - type of cluster used in this instance the format is x; 32x16 means 32 workers, each with 16 cores
bandwidth - network bandwidth [MiB]
netmodel - network model (simple or maxmin)
scheduler_name - name of the scheduler
imode - information mode
min_sched_interval - minimal scheduling delay [s]
sched_time - duration of each scheduler invocation [s]
time - simulated makespan of the task graph execution [s]
execution_time - real duration of all scheduler invocations [s]
total_transfer - amount of data transferred amongst workers [MiB]
The file charts.zip
contains charts obtained by processing the datasets.
On the X axis there is always bandwidth in [MiB/s].
There are the following files:
[DATASET]-schedulers-time - Absolute makespan produced by schedulers [seconds]
[DATASET]-schedulers-score - The same as above but normalized with respect to the best schedule (shortest makespan) for the given configuration.
[DATASET]-schedulers-transfer - Sums of transfers between all workers for a given configuration [MiB]
[DATASET]-[CLUSTER]-netmodel-time - Comparison of netmodels, absolute times [seconds]
[DATASET]-[CLUSTER]-netmodel-score - Comparison of netmodels, normalized to the average of model "simple"
[DATASET]-[CLUSTER]-netmodel-transfer - Comparison of netmodels, sum of transfered data between all workers [MiB]
[DATASET]-[CLUSTER]-schedtime-time - Comparison of MSD, absolute times [seconds]
[DATASET]-[CLUSTER]-schedtime-score - Comparison of MSD, normalized to the average of "MSD=0.0" case
[DATASET]-[CLUSTER]-imode-time - Comparison of Imodes, absolute times [seconds]
[DATASET]-[CLUSTER]-imode-score - Comparison of Imodes, normalized to the average of "exact" imode
Reproducing the results
$ git clone https://github.com/It4innovations/estee $ cd estee $ pip install .
benchmarks/generate.py
to generate graphs
from three categories (elementary, irw and pegasus):$ cd benchmarks $ python generate.py elementary.zip elementary $ python generate.py irw.zip irw $ python generate.py pegasus.zip pegasus
or use our task graph dataset that is provided at https://doi.org/10.5281/zenodo.2630384.
benchmark.json
. Then you can run the benchmark using this command:$ python pbs.py compute benchmark.json
The benchmark script can be interrupted at any time (for example using Ctrl+C). When interrupted, it will store the computed results to the result file and restore the computation when launched again.
$ python view.py --all
The resulting plots will appear in a folder called outputs
.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset contains information about GBM, an aggressive and highly malignant brain tumor that arises from glial cells, characterized by rapid growth and infiltrative behavior. The gene expression profile was measured experimentally using the Affymetrix HT Human Genome U133a microarray platform by the Broad Institute of MIT and Harvard University cancer genomic characterization center. The Sample IDs serve as unique identifiers for each sample.
Inspiration:
This dataset was uploaded to UBRITE for GTKB project.
Instruction:
The log2(x) normalization was removed, and z-normalization was performed on the dataset using a Python script.
Acknowledgments:
Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8
The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764
U-BRITE last update: 07/13/2023
ViC dataset is a collection for implementing a Dynamic Spectrum Access(DSA) system testbed in the CBRS band in the USA. This data is a DSA system which consists of a 2-tier user : Incident user: generating a chirp signal with a Radar system, Primary user: LTE-TDD signal with a CBSD base station system, and corresponds to signal waveforms in the band 3.55-3.56 GHz (Ch1), 3.56-3.57 GHz (Ch2) respectively. There are a total of 12 classes, excluding the assumption that two of the 16 cases are used by CBSD base stations, depending on the presence or absence of two users in two channels. The labels of each data have the following meanings :
0000 (0) : All off 0001 (1) : Ch2 - Radar on 0010 (2) : Ch2 - LTE on 0011 (3) : Ch2 – LTE, Radar on 0100 (4) : Ch1 – Radar on 0101 (5) : Ch1 – Radar on / Ch2 – Radar on 0110 (6) : Ch1 – Radar on /Ch2 – LTE on 0111 (7) : Ch1 – Radar on / Ch2 – LTE, Radar on 1000 (8) : Ch1 – LTE on 1001 (9) : Ch1 – LTE on / Ch2 – Radar on (X) 1010 (10) : Ch1 – LTE on / Ch2 – LTE on (X) 1011 (11) : Ch1 – LTE on / Ch2 – LTE, Radar on 1100 (12) : Ch1 – LTE, Radar on 1101 (13) : Ch1 – LTE, Radar on / Ch2 – Radar on (X) 1110 (14) : Ch1 – LTE, Radar on / Ch2 – LTE on (X) 1111 (15) : Ch1 – LTE, Radar on / Ch2 – LTE, Radar on
This dataset has a total of 7 types consisting of one raw dataset expressed in two extensions, 4 processed datasets processed in different ways, and a label. Except for one of the datasets, all are Python version of numpy files, and the other is a csv file.
(Raw) The raw data is a IQ data generated from testbeds created by imitating the SAS system of CBRS in the United States. In the testbeds, the primary user was made using the LabView communication tool and the USRP antenna (Radar), and the secondary user was made by manufacturing the CBSD base station. This has both csv format and numpy format exist.
(Processed) All of these data except one are normalized to values between 0 and 255 and consist of spectrogram, scalogram, and IQ data. The other one is a spectrogram dataset which is not normalized. They are measured between 250us. In the case of spectrograms and scalograms, the figure formed at 3.56 GHz to 3.57 GHz corresponds to channel 1, and at 3.55 GHz to 3.56 GHz corresponds to channel 2. Among them, signals transmitted from the CBSD base station are output in the form of LTE-TDD signals, and signals transmitted from the Radar system are output in the form of Chirp signals.
(Label) All of the above five data share one label. This label has a numpy format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datafiles contain over 3 million simulated noisy FRET spectra which can be used for validating FRET analysis approaches. The same data is formatted for both matlab and python. The matlab file can be read in with each of the components as a separate variable. The python numpy array can be read in and then converted to a dictionary containing each component using numpy.load(‘YourFilepath’).item().Matlab Variable/Python Dictionary Entry: ’Simulated Pixels’: These are simulated noisy spectra covering a range of SNR and FRET efficiencies, organized in the
following manner: Simulated_Pixels[N,Power,Efficiency,Excitation,Emission], where N are repeat simulations to calculate
statistics with (1000 simulations for every condition). Has an overall shape of (1000, 150, 11, 2, 32).
’sRET luxFRET Calibration Spectra’: These are noiseless calibration spectra organized in the following manner: sRET luxFRET Calibration Spectra[Power,Donor or Acceptor,Excitation,Emission], with an overall shape of (150, 2, 2, 32).
These spectra can also be used to calculate the normalized emission spectra and gamma parameter needed for sensorFRET, but we also included those values separately for convenience.
’Power Vector’: The vector relating the indices in the 2nd dimension of ‘Simulated Pixels’ to the simulated power used
ranging from 0.1-1000 (arbitrary units) in 150 logarithmic steps to change the SNR and provide normalized residuals in the approximate range of 0.001 to 0.1.
’Efficiency Vector’: The vector relating the indices in the 3rd dimension of ‘Simulated Pixels’ to the simulated FRET
efficiency ranging from 0 to 1 in 11 linear steps.
’Excitation Wavelength Vector’: The vector relating the indices of the 4th dimension of ‘Simulated Pixels’ to the simulated
Excitation wavelength, either 405 or 458.
’Emission Wavelength Vector’: The vector relating the indices of the 5th dimension of ‘Simulated Pixels’ to the simulated
Emission Wavelengths, ranging from 416 to 718 in 32 linear steps to match the spectral resolution of our experiments.
’Normalized Emission Spectra’:An array containing the normalized emission shapes for the Cerulean and Venus fluorophores (shape of (2, 32)).
’Gamma’: the sensorFRET gamma parameter for the Cerulean/Venus-405/458 pairing, 0.0605 (from experiment) ’Qd’: the quantum efficiency of Cerulean, 0.62.
’Qa’: the quantum efficiency of Venus, 0.57.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SDE integration with HelMod results in a quite expensive effort from the computational point of view since, to minimize the uncertainties, a huge amount of events should be integrated from Earth to the heliosphere boundary. Monte Carlo integration allows us to evaluate the normalized probability function (G) that a particle observed at Earth with rigidity R0 entered into the heliosphere with rigidity R. The convolution of the normalized probability function with the very local interstellar spectra result in the modulation of differential intensity for the time and solar distance where G was evaluated. In the present dataset, we provide the numerical output of HelMod-4 model (www.helmod.org) in the form of normalized probability histograms. The python script attached is able to convert GALPROP output (or plain text LIS file) to modulated spectrum for periods of selected experiments.
This dataset was used as part of the publications in the references.
For any information about the HelMod-4 Model, please refer to the official website.
How to install and configure
Install python (>3.0) packages
Download the Python OfflineModule and the HelModArchive. The archive is provided in tgz format, thus it needs to be first unpacked with the command tar -xvzf
.
The archive structure:
The HelModArchives.tgz contains several directories each one with the name of a space or balloon mission. Each folder should be considered as an HelMod Archive containing the following files:
How to use the module:
The usage of the module requires three elements:
).
) that are intended to be modulated.The list of available in each archive may be found in the file ExpList_Plot.list or using the command-line
python3 HelMod_Module.py -a
The basic command to get the modulated spectrum is:
python3 HelMod_Module.py -a
other available options:
-h
help description-t
use this option to specify that -p
Choose a different set of parameters. The list of available parameter set names is available in the file ParameterSimulated_DB.list .--MakePlot
Create a Plot in png format.--SumAllIsotpes
(can be used with GALPROP LIS inputs) evaluate the modulated spectra as the sum of the modulated isotopes spectra (note that without this option only the LIS of the isotope specified in --PrintLIS
Create a file with the LIS in the format of a two-column plain text file.--SimUnit
force the Output Unit of the module: use Tkin to select Kinetic Energy per Nucleon [GeV/n], use Rigi to select Rigidity [GV]. If not specified, the output is chosen accordingly to the original format of the experimental dataset.-
o
Use a custom name for the output file.LIS in text format
Users can provide a txt file for LIS with the following characteristics:
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Abstract: The dataset contains measurements of magnetic susceptibility in dependence of temperature of shocked magnetite and of a natural magnetite single crystal before and after manual crushing. A python code for evaluation of low-temperature susceptibility curves is included. The data are supplementary to: Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International TechnicalRemarks: The data set contains k-T curves of - Initial magnetite ore from Sydvaranger mine (Norway) - the same ore after shock at 3, 5, 10, 20 and 30 GPa under laboratory conditions and after subsequent heating to 973 K -Natural magnetite single crystal (initial and after manual crushing) The data set contains a python code for evaluation of normalized low-temperature k-T curves. Experimental conditions are described in [1]. The approach for k-T curve evaluation is described in [2] [1]: Kontny, A., Reznik, B., Boubnov, A., Göttlicher, J. and Steininger, R., 2018. Postshock Thermally Induced Transformations in Experimentally Shocked Magnetite, Geochemistry, Geophysics, Geosystems, Vol. 19, 3, pp. 921–931, doi:10.1002/2017GC007331. [2] Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.
Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.
The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.
The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.