Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
**Overview:
The Bonn EEG Dataset is a widely recognized dataset in the field of biomedical signal processing and machine learning, specifically designed for research in epilepsy detection and EEG signal analysis. It contains electroencephalogram (EEG) recordings from both healthy individuals and patients with epilepsy, making it suitable for tasks such as seizure detection and classification of brain activity states. The dataset is structured into five distinct subsets (labeled A, B, C, D, and E), each comprising 100 single-channel EEG segments, resulting in a total of 500 segments. Each segment represents 23.6 seconds of EEG data, sampled at a frequency of 173.61 Hz, yielding 4,096 data points per segment, stored in ASCII format as text files.
****Structure and Label:
**Key Characteristics
**Applications
The Bonn EEG Dataset is ideal for machine learning and signal processing tasks, including: - Developing algorithms for epileptic seizure detection and prediction. - Exploring feature extraction techniques, such as wavelet transforms, for EEG signal analysis. - Classifying brain states (healthy vs. epileptic, interictal vs. ictal). - Supporting research in neuroscience and medical diagnostics, particularly for epilepsy monitoring and treatment.
**Source
**Citation
When using this dataset, researchers are required to cite the original publication: Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6), 061907. DOI: 10.1103/PhysRevE.64.061907.
**Additional Notes
The dataset is randomized, with no specific information provided about patients or electrode placements, ensuring simplicity and focus on signal characteristics.
The data is not hosted on Kaggle or Hugging Face but is accessible directly from the University of Bonn’s repository or mirrored sources.
Researchers may need to apply preprocessing steps, such as filtering or normalization, to optimize the data for machine learning tasks.
The dataset’s balanced structure and clear labels make it an excellent choice for a one-week machine learning project, particularly for tasks involving traditional algorithms like SVM, Random Forest, or Logistic Regression.
This dataset provides a robust foundation for learning signal processing, feature extraction, and machine learning techniques while addressing a real-world medical challenge in epilepsy detection.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This database, collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure medication in order to characterize their seizures and assess their candidacy for surgical intervention. The recordings are grouped into 23 cases and were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages 1.5–19).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PCA
Facebook
TwitterThis is a dataset of EEG brainwave data that has been processed with our original strategy of statistical extraction (paper below)
The data was collected from two people (1 male, 1 female) for 3 minutes per state - positive, neutral, negative. We used a Muse EEG headband which recorded the TP9, AF7, AF8 and TP10 EEG placements via dry electrodes. Six minutes of resting neutral data is also recorded, the stimuli used to evoke the emotions are below
1 . Marley and Me - Negative (Twentieth Century Fox) Death Scene 2. Up - Negative (Walt Disney Pictures) Opening Death Scene 3. My Girl - Negative (Imagine Entertainment) Funeral Scene 4. La La Land - Positive (Summit Entertainment) Opening musical number 5. Slow Life - Positive (BioQuest Studios) Nature timelapse 6. Funny Dogs - Positive (MashupZone) Funny dog clips
Our method of statistical extraction resampled the data since waves must be mathematically described in a temporal fashion.
If you would like to use the data in research projects, please cite the following:
J. J. Bird, L. J. Manso, E. P. Ribiero, A. Ekart, and D. R. Faria, “A study on mental state classification using eeg-based brain-machine interface,”in 9th International Conference on Intelligent Systems, IEEE, 2018.
J. J. Bird, A. Ekart, C. D. Buckingham, and D. R. Faria, “Mental emotional sentiment classification with an eeg-based brain-machine interface,” in The International Conference on Digital Image and Signal Processing (DISP’19), Springer, 2019.
This research was part supported by the EIT Health GRaCE-AGE grant number 18429 awarded to C.D. Buckingham.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain the raw data and processing parameters to go with the paper "Hierarchical structure guides rapid linguistic predictions during naturalistic listening" by Jonathan R. Brennan and John T. Hale. These files include the stimulus (wav files), raw data (matlab format for the Fieldtrip toolbox), data processing paramters (matlab), and variables used to align the stimuli with the EEG data and for the statistical analyses reported in the paper.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Recordings, grouped into 23 cases, were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages 1.5–19). (Case chb21 was obtained 1.5 years after case chb01, from the same female subject.) The file SUBJECT-INFO contains the gender and age of each subject. (Case chb24 was added to this collection in December 2010, and is not currently included in SUBJECT-INFO.)
Each case (chb01, chb02, etc.) contains between 9 and 42 continuous .edf files from a single subject. Hardware limitations resulted in gaps between consecutively-numbered .edf files, during which the signals were not recorded; in most cases, the gaps are 10 seconds or less, but occasionally there are much longer gaps. In order to protect the privacy of the subjects, all protected health information (PHI) in the original .edf files has been replaced with surrogate information in the files provided here. Dates in the original .edf files have been replaced by surrogate dates, but the time relationships between the individual files belonging to each case have been preserved. In most cases, the .edf files contain exactly one hour of digitized EEG signals, although those belonging to case chb10 are two hours long, and those belonging to cases chb04, chb06, chb07, chb09, and chb23 are four hours long; occasionally, files in which seizures are recorded are shorter.
All signals were sampled at 256 samples per second with 16-bit resolution. Most files contain 23 EEG signals (24 or 26 in a few cases). The International 10-20 system of EEG electrode positions and nomenclature was used for these recordings. In a few records, other signals are also recorded, such as an ECG signal in the last 36 files belonging to case chb04 and a vagal nerve stimulus (VNS) signal in the last 18 files belonging to case chb09. In some cases, up to 5 “dummy” signals (named "-") were interspersed among the EEG signals to obtain an easy-to-read display format; these dummy signals can be ignored.
The file RECORDS contains a list of all 664 .edf files included in this collection, and the file RECORDS-WITH-SEIZURES lists the 129 of those files that contain one or more seizures. In all, these records include 198 seizures (182 in the original set of 23 cases); the beginning ([) and end (]) of each seizure is annotated in the .seizure annotation files that accompany each of the files listed in RECORDS-WITH-SEIZURES. In addition, the files named chbnn-summary.txt contain information about the montage used for each recording, and the elapsed time in seconds from the beginning of each .edf file to the beginning and end of each seizure contained in it.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Siena Sleep EEG dataset contains multi-channel EEG recordings collected during sleep, specifically curated for epilepsy detection and sleep stage analysis. Electroencephalography (EEG) is one of the most reliable methods for studying brain activity during sleep, and it plays a crucial role in diagnosing neurological disorders such as epilepsy.
The dataset is formatted as a large-scale time-series table where each row represents a sampled time point, and each column corresponds to an EEG electrode channel. An additional diagnosis label column indicates whether the signal segment belongs to a healthy control or an epilepsy patient.
Dataset Structure
Number of Records: 944,640 samples
Number of Features: 20 EEG channels + 1 diagnosis label
File Format: CSV
Memory Size: ~150 MB
Columns
EEG Channels (20):
Fp1, F3, C3, P3, O1, F7, T3, T5, Fc1, Fc5, Cp1, Cp5, F9, Fz, Cz, Pz, Pf2, F4, C4, P4
These correspond to standard 10–20 EEG electrode placements, covering frontal, central, parietal, occipital, and temporal lobes.
diagnosis: 0 → Non-epileptic (Healthy subject)
1 → Sleep Stage Epileptic case
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The database consists of EEG recordings of 14 patients acquired at the Unit of Neurology and Neurophysiology of the University of Siena. Subjects include 9 males (ages 25-71) and 5 females (ages 20-58). Subjects were monitored with a Video-EEG with a sampling rate of 512 Hz, with electrodes arranged on the basis of the international 10-20 System. Most of the recordings also contain 1 or 2 EKG signals. The diagnosis of epilepsy and the classification of seizures according to the criteria of the International League Against Epilepsy were performed by an expert clinician after a careful review of the clinical and electrophysiological data of each patient.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset
Synthetic EEG data generated by the ‘bai’ model based on real data.
Features/Columns:
No: "Number" Sex: "Gender" Age: "Age of participants" EEG Date: "The date of the EEG" Education: "Education level" IQ: "IQ level of participants" Main Disorder: "General class definition of the disorder" Specific Disorder: "Specific class definition of the disorder"
Total Features/Columns: 1140
Content:
Obsessive Compulsive Disorder Bipolar Disorder Schizophrenia… See the full description on the dataset page: https://huggingface.co/datasets/Neurazum/General-Disorders-EEG-Dataset-v1.
Facebook
Twitterhttps://github.com/bdsp-core/bdsp-license-and-duahttps://github.com/bdsp-core/bdsp-license-and-dua
The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University: Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH). The EEG data includes three types:
rEEG: "routine EEGs" recorded in the outpatient setting.
EMU: recordings obtained in the inpatient setting, within the Epilepsy Monitoring Unit (EMU).
ICU/LTM: recordings obtained from acutely and critically ill patients within the intensive care unit (ICU).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This multi-subject and multi-session EEG dataset for modelling human visual object recognition (MSS) contains:
More details about the dataset are described as follows.
32 participants were recruited from college students in Beijing, of which 4 were female, and 28 were male, with an age range of 21-33 years. 100 sessions were conducted. They were paid and gave written informed consent. The study was conducted under the approval of the ethical committee of the Institute of Automation at the Chinese Academy of Sciences, with the approval number: IA21-2410-020201.
After every 50 sequences, there was a break for the participants to rest. Each rapid serial sequence lasted approximately 7.5 seconds, starting with a 750ms blank screen with a white fixation cross, followed by 20 or 21 images presented at 5 Hz with a 50% duty cycle. The sequence ended with another 750ms blank screen.
After the rapid serial sequence, there was a 2-second interval during which participants were instructed to blink and then report whether a special image appeared in the sequence using a keyboard. During each run, 20 sequences were randomly inserted with additional special images at random positions. The special images are logos for brain-computer interfaces.
Each image was displayed for 1 second and was followed by 11 choice boxes (1 correct class box, 9 random class boxes, and 1 reject box). Participants were required to select the correct class of the displayed image using a mouse to increase their engagement. After the selection, a white fixation cross was displayed for 1 second in the centre of the screen to remind participants to pay attention to the upcoming task.
The stimuli are from two image databases, ImageNet and PASCAL. The final set consists of 10,000 images, with 500 images for each class.
In the derivatives/annotations folder, there are additional information of MSS:
The EEG signals were pre-processed using the MNE package, version 1.3.1, with Python 3.9.16. The data was sampled at a rate of 1,000 Hz with a bandpass filter applied between 0.1 and 100 Hz. A notch filter was used to remove 50 Hz power frequency. Epochs were created for each trial ranging from 0 to 500 ms relative to stimulus onset. No further preprocessing or artefact correction methods were applied in technical validation. However, researchers may want to consider widely used preprocessing steps such as baseline correction or eye movement correction. After the preprocessing, each session resulted in two matrices: RSVP EEG data matrix of shape (8,000 image conditions × 122 EEG channels × 125 EEG time points) and low-speed EEG data matrix of shape (400 image conditions × 122 EEG channels × 125 EEG time points).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset provides resting-state EEG data (eyes open,partially eyes closed) from 71 participants who underwent two experiments involving normal sleep (NS---session1) and sleep deprivation(SD---session2) .The dataset also provides information on participants' sleepiness and mood states. (Please note here Session 1 (NS) and Session 2 (SD) is not the time order, the time order is counterbalanced across participants and is listed in metadata.)
The data collection was initiated in March 2019 and was terminated in December 2020. The detailed description of the dataset is currently under working by Chuqin Xiang,Xinrui Fan,Duo Bai,Ke Lv and Xu Lei, and will submit to Scientific Data for publication.
* If you have any questions or comments, please contact:
* Xu Lei: xlei@swu.edu.cn
Xiang, C., Fan, X., Bai, D. et al. A resting-state EEG dataset for sleep deprivation. Sci Data 11, 427 (2024). https://doi.org/10.1038/s41597-024-03268-2
Facebook
Twitterhttps://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
This data set consists of over 240 two-minute EEG recordings obtained from 20 volunteers. Resting-state and auditory stimuli experiments are included in the data. The goal is to develop an EEG-based Biometric system.
The data includes resting-state EEG signals in both cases: eyes open and eyes closed. The auditory stimuli part consists of six experiments; Three with in-ear auditory stimuli and another three with bone-conducting auditory stimuli. The three stimuli for each case are a native song, a non-native song, and neutral music.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
When you use this dataset, please cite this paper. More information about this dataset could also be found in this paper.
Xu, X., Wang, B., Xiao, B., Niu, Y., Wang, Y., Wu, X., & Chen, J. (2024). Beware of Overestimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals. arXiv preprint arXiv:2405.17024.
The present work aims to demonstrate that temporal autocorrelations (TA) significantly impacts various BCI tasks even in conditions without neural activity. We used the watermelon as the phantom head and found that we could get the pitfall of overestimated decoding performance if continuous EEG data with the same class label were split into training and test sets. More details can be found in Motivation.
As watermelons cannot perform any experimental tasks, we can reorganize it to the format of various actual EEG dataset without the need to collect EEG data as previous work did (examples in Domain Studied).
Manufacturers: NeuroScan SynAmps2 system (Compumedics Limited, Victoria, Australia)
Configuration: 64-channel Ag/AgCl electrode cap with a 10/20 layout
Watermelons. Ten watermelons served as phantom heads.
Overestimated Decoding Performance in EEG decoding.
Following BCI datasets in various BCI tasks have been reorganized using the Phantom EEG Dataset. The pitfall has been found in four of five tasks.
- CVPR dataset [1] for image decoding task.
- DEAP dataset [2] for emotion recognition task.
- KUL dataset [3] for auditory spatial attention decoding task.
- BCIIV2a dataset [4] for motor imagery task (the pitfalls were absent due to the use of rapid-design paradigm during EEG recording).
- SIENA dataset [5] for epilepsy detection task.
Resting State but you could reorganize it to any task in BCI.
The Phantom EEG Dataset
Creative Commons Attribution 4.0 International
Your could get the code to read the data files (.cnt or .set) in the “code” folder.
To run the codes, you should install the mne and numpy package. You could install via pip
pip install mne==1.3.1
pip install numpy
Then, you could use “BID2WMCVPR.py” to convert the BID dataset to the WM-CVPR dataset. You could also use “CNTK2WMCVPR.py” to convert the CNT dataset to the WM-CVPR dataset.
The codes to reorganize other datasets other than CVPR [1] will be released on github after reviewing.
- CNT: the raw data.
Each Subject (S*.cnt) contains the following information:
EEG.data: EEG data (samples X channels)
EEG.srate: Sampling frequency of the saved data
EEG.chanlocs : channel numbers (1 to 68, ‘EKG’ ‘EMG’ 'VEO' 'HEO' were not recorded)
- BIDS: an extension to the brain imaging data structure for electroencephalography. BIDS primarily addresses the heterogeneity of data organization by following the FAIR principles [6].
Each Subject (sub-S*/eeg/) contains the following information:
sub-S*_task-RestingState_channels.tsv: channel numbers (1 to 68, ‘EKG’ ‘EMG’ 'VEO' 'HEO' were not recorded)
sub-S*_task-RestingState_eeg.json: Some information about the dataset.
sub-S*_task-RestingState_eeg.set: EEG data (samples X channels)
sub-S*_task-RestingState_events.tsv: the event during recording. We organized events using block-design and rapid-event-design. However, it is important to note that this does not need to be considered in any subsequent data reorganization, as watermelons cannot follow any experimental instructions.
- code: more information on Code.
- readme.md: the information about the dataset.
An additional electrode was placed on the lower part of the watermelon as the physiological reference, and the forehead served as the ground site. The inter-electrode impedances were maintained under 20 kOhm. Data were recorded at a sampling rate of 1000 Hz. EEG recordings for each watermelon lasted for more than 1 hour to ensure sufficient data for the decoding task.
Citation will be updated after the review period is completed.
We will provide more information about this dataset (e.g. the units of the captured data) once our work is accepted. This is because our work is currently under review, and we are not allowed to disclose more information according to the relevant requirements.
All metadata will be provided as a backup on Github and will be available after the review period is completed.
Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, epilepsy detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were doubted by some researchers, and they proposed that such decoding accuracy was overestimated due to the inherent temporal autocorrelations (TA) of EEG signals [7]–[9].
However, the coupling between the stimulus-driven neural responses and the EEG temporal autocorrelations makes it difficult to confirm whether this overestimation exists in truth. Some researchers also argue that the effect of TA in EEG data on decoding is negligible and that it becomes a significant problem only under specific experimental designs in which subjects do not have enough resting time [10], [11].
Due to a lack of problem formulation previous studies [7]–[9] only proposed that block-design should not be used to avoid the pitfall. However, the impact of TA could be avoided only when the trial of EEG was not further segmented into several samples. Otherwise, the overfitting or pitfall would still occur. In contrast, when the correct data splitting strategy was used (e.g. separating training and test data in time), the pitfall could also be avoided even when block-design was used.
In our framework, we proposed the concept of "domain" to represent the EEG patterns resulting from TA and then used phantom EEG to remove stimulus-driven neural responses for verification. The results confirmed that the TA, always existing in the EEG data, added unique domain features to a continuous segment of EEG. The specific finding is that when the segment of EEG data with the same class label is split into multiple samples, the classifier will associate the sample's class label with the domain features, interfering with the learning of class-related features. This leads to an overestimation of decoding performance for test samples from the domains seen during training, and results in poor accuracy for test samples from unseen domains (as in real-world applications).
Importantly, our work suggests that the key to reducing the impact of EEG TA on BCI decoding is to decouple class-related features from domain features in the actual EEG dataset. Our proposed unified framework serves as a reminder to BCI researchers of the impact of TA on their specific BCI tasks and is intended to guide them in selecting the appropriate experimental design, splitting strategy and model construction.
We must point out that the "phantom EEG" indeed does not contain any "EEG" but records only noise, a watermelon is not a brain and does not generate any electrical signals. Therefore, the recorded electrical noises, even when amplified using equipment typically used for EEG, do not constitute EEG data when considering the definition of EEG. This is why previous researchers called it "phantom EEG". Some researchers may therefore think that it is questionable to use watermelon to get the phantom EEG.
However, the usage of the phantom head allows researchers to evaluate the performance of neural-recording equipment and proposed algorithms without the effects of neural activity variability, artifacts, and potential ethical issues. Phantom heads used in previous studies include digital models [12]–[14], real human skulls [15]–[17], artificial physical phantoms [18]–[24] and watermelons [25]–[40]. Due to their similar conductivity to human tissue, similar size and shape to the human head, and ease of acquisition, watermelons are widely used as "phantom heads".
Most works tried to use watermelon as a phantom head and found that the results analyzed using the neural signals from human subjects could not be obtained when using the phantom head, thus proving that the achieved results were indeed caused by neural signals. For example, Mutanen et.al [35] proposed that “the fact that the phantom head stimulation did not evoke similar biphasic artifacts excludes the possibility that residual induced artifacts, with the current TMS-compatible EEG system, could explain these components”.
Our work differs significantly from most previous works. It is firstly found in our work that the phantom EEG exhibits the effect of TA on BCI decoding even when only noise was recorded, indicating the inherent existence of TA in the EEG data. The conclusion we hope to draw is that some current works may not truly use stimulus-driven neural
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The datset is comprised of 46(22 commercial adertisement and 24 kannada Music clips) different subjcets EEG data recorded uisng 2 channel EEG device
The dataset folder contaions two sub folder 1. comercial advertisement 1.1 Channel_1(Ch_1) and Channel_2 (Ch_2) :Prefontal Cortex 2. Kannada Musical clips 2.1 channel_1(Ch_1) and Channel_2 (Ch_2) :left Brain
Excel file information : Each file column represneted as number of subjects and row is represnted as features per subjects There are totaly 12 excel files from two channels ( 6 for commercial advertisemnt and 6 for kannda Musical clips).
Subjective self-rating scale
Name
age
Gender
Have you ever had any health issues? YES NO
Have you watched this song/advertisement before? YES NO
Please let us know if this advertisement brings up any specific memories for you. YES NO
Please Rate the following query from 1 to 10.
How funny was the advertisement you watched
How sad was the advertisement you watched
How Horror was the advertisement you watched
How relaxed was the Music you viewed with
How Sad was the Music you viewed with
How enjoyable was the Music you viewed with
Do you think what you just watched was entertaining enough?
If you have any comment please write here
Here is the website address for each stimulus that we considered:
ad1: https://www.youtube.com/watch?v=ZzG7duipQ7U&ab_channel=perfettiindia ad2: https://www.youtube.com/watch?v=SfAxUpeVhCg&ab_channel=bo0fhead ad3: https://www.youtube.com/watch?v=HqGsT6VM8Vg&ab_channel=kiddlestix song1: https://www.youtube.com/hashtag/kgfchapter2 song 2: https://www.youtube.com/watch?v=x43w4lLS9E0&ab_channel=AnandAudio Song 3: https://youtube.com/watch?v=Ysf4QRrcLGM&si=EnSIkaIECMiOmarE
For a more comprehensive understanding of the dataset and its background, we kindly ask researchers to refer to our associated manuscript titled:
Entertainment Based Database for Emotion Recognition from EEG Signals, the research article accepted at 3rd International Conference on Applied Intelligence and informatics (AII2023) held in Fostering reproducibility of research results right 29 -31 OCT 2023, DUBAI, UAE. (When utilizing this dataset in your research, please consider citing the following reference)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains EEG recordings from 18 subjects listening to one of two competing speech audio streams. Continuous speech in trials of ~50 sec. was presented to normal hearing listeners in simulated rooms with different degrees of reverberation. Subjects were asked to attend one of two spatially separated speakers (one male, one female) and ignore the other. Repeated trials with presentation of a single talker were also recorded. The data were recorded in a double-walled soundproof booth at the Technical University of Denmark (DTU) using a 64-channel Biosemi system and digitized at a sampling rate of 512 Hz. Full details can be found in:
and
The data is organized in format of the publicly available COCOHA Matlab Toolbox. The preproc_script.m demonstrates how to import and align the EEG and audio data. The script also demonstrates some EEG preprocessing steps as used the Wong et al. paper above. The AUDIO.zip contains wav-files with the speech audio used in the experiment. The EEG.zip contains MAT-files with the EEG/EOG data for each subject. The EEG/EOG data are found in data.eeg with the following channels:
The expinfo table contains information about experimental conditions, including what what speaker the listener was attending to in different trials. The expinfo table contains the following information:
DATA_preproc.zip contains the preprocessed EEG and audio data as output from preproc_script.m.
The dataset was created within the COCOHA Project: Cognitive Control of a Hearing Aid
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).The active phase of the experimental session included sequential execution of 60 fine motor tasks - squeezing a hand into a fist after the first audio command and holding it until the second audio command (30 repetitions per hand) (see Fig.1). Duration of the audio command determined type of the motor action to be executed: 0.25s for left hand (LH) movement and 0.75s for right rand (RH) movement. The time interval between two audio signals was selected randomly in the range 4-5s for each trial. The sequence of motor tasks was randomized and the pause between tasks was also chosen randomly in the range 6-8s to exclude possible training or motor-preparation effects caused by the sequential execution of the same tasks.Acquired EEG signals were then processed via preprocessing tools implemented in MNE Python package. Specifically, raw EEG signals were filtered by a Butterworth 5th order filter in the range 1-100 Hz, and by 50Hz Notch filter. Further, Independent Component Analysis (ICA) was applied to remove ocular and cardiac artifacts. Artifact-free EEG recordings were then segmented into 60 epochs according to the experimental protocol. Each epoch was 14s long, including 3s of baseline and 11s of motor-related brain activity, and time-locked to the first audio command indicating the start of motor execution. After visual inspection epochs that still contained artifacts were rejected. Finally, 15 epochs per movement type were stored for each subject.Individual epochs for each subject are stored in the attached MNE .fif files. Prefix EA or YA in the name of the file identifies the age group, which subject belongs to. Postfix LH or RH in the name of the file indicates the type of motor tasks.EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises EEG recordings from eight ALS patients aged between 45.5 and 74 years. Patients exhibited revised ALS Functional Rating Scale (ALSFRS-R) scores ranging from 0 to 46, with time since symptom onset (TSSO) varying between 12 and 113 months. Notably, no disease progression was reported during the study period, ensuring stability in clinical conditions. The participants were recruited from the Penn State Hershey Medical Center ALS Clinic and had confirmed ALS diagnoses without significant dementia. This rigorous selection criterion ensured the validity and reliability of the dataset for motor imagery analysis in an ALS population.The EEG data were collected using 19 electrodes placed according to the international 10-20 system (FP1, FP2, F7, F3, FZ, F4, F8, T7, C3, CZ, C4, T8, P7, P3, PZ, P4, P8, O1, O2), with signals referenced to linked earlobes and a ground electrode at FPz. Additionally, three electrooculogram (EOG) electrodes were employed to facilitate artifact removal, maintaining impedance levels below 10 kΩ throughout data acquisition. The data were amplified using two g.USBamp systems (g.tec GmbH) and recorded via the BCI2000 software suite, with supplementary preprocessing in MATLAB. All experimental procedures adhered strictly to Penn State University’s IRB protocol PRAMSO40647EP, ensuring ethical compliance.Each participant underwent four brain-computer interface (BCI) sessions conducted over a period of 1 to 2 months. Each session consisted of four runs, with 10 trials per class (left hand, right hand, and rest) for a total of 40 trials per session. The sessions began with a calibration run to initialize the system, followed by feedback runs during which participants controlled a cursor's movement through motor imagery, specifically imagined grasping movements. The study design, focused on motor imagery (MI), generated a total of 160 trials per participant over two months.This dataset holds significance in studying the longitudinal dynamics of motor imagery decoding in ALS patients. To ensure reproducibility of our findings and to promote advancements in the field, we have received explicit permission from Prof. Geronimo of Penn State University to distribute this dataset in the processed format for research purposes. The original publication of this collection can be found below.How to use this dataset: This dataset is structured in MATLAB as a collection of subject-specific structs, where each subject is represented as a single struct. Each struct contains three fields:L: Trials corresponding to Left Motor Imagery.R: Trials corresponding to Right Motor Imagery.Re: Trials corresponding to Rest state.Each field contains an array of trials, where each trial is represented as a matrix with, Rows as Timestamps, and Columns as channels.Primary Collection: Geronimo A, Simmons Z, Schiff SJ. Performance predictors of brain-computer interfaces in patients with amyotrophic lateral sclerosis. Journal of neural engineering 2016 13. 10.1088/1741-2560/13/2/026002.All code for any publications with this data has been made publicly available at the following link:https://github.com/rishannp/Auto-Adaptive-FBCSPhttps://github.com/rishannp/Motor-Imagery---Graph-Attention-Network
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database includes the de-identified EEG data from 62 healthy individuals who participated in a brain-computer interface (BCI) study. All subjects underwent 7-11 sessions of BCI training which involves controlling a computer cursor to move in one-dimensional and two-dimensional spaces using subject’s “intent”. EEG data were recorded with 62 electrodes. In addition to the EEG data, behavioral data including the online success rate of BCI cursor control are also included.This dataset was collected under support from the National Institutes of Health via grants AT009263, EB021027, NS096761, MH114233, RF1MH to Dr. Bin He. Correspondence about the dataset: Dr. Bin He, Carnegie Mellon University, Department of Biomedical Engineering, Pittsburgh, PA 15213. E-mail: bhe1@andrew.cmu.edu This dataset has been used and analyzed to study the learning of BCI control and the effects of mind-body awareness training on this process. The results are reported in: Stieger et al, “Mindfulness Improves Brain Computer Interface Performance by Increasing Control over Neural Activity in the Alpha Band,” Cerebral Cortex, 2020 (https://doi.org/10.1093/cercor/bhaa234). Please cite this paper if you use any data included in this dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
**Overview:
The Bonn EEG Dataset is a widely recognized dataset in the field of biomedical signal processing and machine learning, specifically designed for research in epilepsy detection and EEG signal analysis. It contains electroencephalogram (EEG) recordings from both healthy individuals and patients with epilepsy, making it suitable for tasks such as seizure detection and classification of brain activity states. The dataset is structured into five distinct subsets (labeled A, B, C, D, and E), each comprising 100 single-channel EEG segments, resulting in a total of 500 segments. Each segment represents 23.6 seconds of EEG data, sampled at a frequency of 173.61 Hz, yielding 4,096 data points per segment, stored in ASCII format as text files.
****Structure and Label:
**Key Characteristics
**Applications
The Bonn EEG Dataset is ideal for machine learning and signal processing tasks, including: - Developing algorithms for epileptic seizure detection and prediction. - Exploring feature extraction techniques, such as wavelet transforms, for EEG signal analysis. - Classifying brain states (healthy vs. epileptic, interictal vs. ictal). - Supporting research in neuroscience and medical diagnostics, particularly for epilepsy monitoring and treatment.
**Source
**Citation
When using this dataset, researchers are required to cite the original publication: Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6), 061907. DOI: 10.1103/PhysRevE.64.061907.
**Additional Notes
The dataset is randomized, with no specific information provided about patients or electrode placements, ensuring simplicity and focus on signal characteristics.
The data is not hosted on Kaggle or Hugging Face but is accessible directly from the University of Bonn’s repository or mirrored sources.
Researchers may need to apply preprocessing steps, such as filtering or normalization, to optimize the data for machine learning tasks.
The dataset’s balanced structure and clear labels make it an excellent choice for a one-week machine learning project, particularly for tasks involving traditional algorithms like SVM, Random Forest, or Logistic Regression.
This dataset provides a robust foundation for learning signal processing, feature extraction, and machine learning techniques while addressing a real-world medical challenge in epilepsy detection.