100+ datasets found

EEG Dataset for ADHD
kaggle.com
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danizo (2025). EEG Dataset for ADHD [Dataset]. https://www.kaggle.com/datasets/danizo/eeg-dataset-for-adhd
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Danizo
Description
This is the Dataset Collected by Shahed Univeristy Released in IEEE.

the Columns are: Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2, Class, ID

the first 19 are channel names.

Class: ADHD/Control

ID: Patient ID

Participants were 61 children with ADHD and 60 healthy controls (boys and girls, ages 7-12). The ADHD children were diagnosed by an experienced psychiatrist to DSM-IV criteria, and have taken Ritalin for up to 6 months. None of the children in the control group had a history of psychiatric disorders, epilepsy, or any report of high-risk behaviors.

EEG recording was performed based on 10-20 standard by 19 channels (Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2) at 128 Hz sampling frequency. The A1 and A2 electrodes were the references located on earlobes.

Since one of the deficits in ADHD children is visual attention, the EEG recording protocol was based on visual attention tasks. In the task, a set of pictures of cartoon characters was shown to the children and they were asked to count the characters. The number of characters in each image was randomly selected between 5 and 16, and the size of the pictures was large enough to be easily visible and countable by children. To have a continuous stimulus during the signal recording, each image was displayed immediately and uninterrupted after the child’s response. Thus, the duration of EEG recording throughout this cognitive visual task was dependent on the child’s performance (i.e. response speed).

Citation Author(s): Ali Motie Nasrabadi Armin Allahverdy Mehdi Samavati Mohammad Reza Mohammadi

DOI: 10.21227/rzfh-zn36

License: Creative Commons Attribution
p
CHB-MIT Scalp EEG Database
physionet.org
Updated Jun 9, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Guttag (2010). CHB-MIT Scalp EEG Database [Dataset]. http://doi.org/10.13026/C2K01R
Explore at:
Unique identifier
https://doi.org/10.13026/C2K01R
Dataset updated
Jun 9, 2010
Authors
John Guttag
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This database, collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure medication in order to characterize their seizures and assess their candidacy for surgical intervention. The recordings are grouped into 23 cases and were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages 1.5–19).
Data from: EEG-Dataset
kaggle.com
zip
Updated Aug 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quân Nguyễn Bảo (2025). EEG-Dataset [Dataset]. https://www.kaggle.com/datasets/quands/eeg-dataset
Explore at:
zip(3155571 bytes)Available download formats
Dataset updated
Aug 3, 2025
Authors
Quân Nguyễn Bảo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
**Overview:

The Bonn EEG Dataset is a widely recognized dataset in the field of biomedical signal processing and machine learning, specifically designed for research in epilepsy detection and EEG signal analysis. It contains electroencephalogram (EEG) recordings from both healthy individuals and patients with epilepsy, making it suitable for tasks such as seizure detection and classification of brain activity states. The dataset is structured into five distinct subsets (labeled A, B, C, D, and E), each comprising 100 single-channel EEG segments, resulting in a total of 500 segments. Each segment represents 23.6 seconds of EEG data, sampled at a frequency of 173.61 Hz, yielding 4,096 data points per segment, stored in ASCII format as text files.

****Structure and Label:

Set A: EEG recordings from healthy individuals with eyes open, capturing normal brain activity under visual stimulation.

Set B: EEG recordings from healthy individuals with eyes closed, reflecting brain activity in a resting state.

Set C: EEG recordings from epilepsy patients, collected from the epileptogenic zone during an interictal (seizure-free) period.

Set D: EEG recordings from epilepsy patients, collected from the hippocampal formation of the opposite brain hemisphere during an interictal period.

Set E: EEG recordings from epilepsy patients during an ictal (seizure) period, capturing brain activity during an epileptic seizure. Each subset contains 100 EEG segments, ensuring a balanced distribution across the five classes, which supports both binary (e.g., healthy vs. epileptic) and multi-class (e.g., A-E classification) tasks.

**Key Characteristics

Size: 500 EEG segments (100 segments per subset, across five subsets).

Data Type: Single-channel EEG signals, stored in text files (ASCII format).

Sampling Rate: 173.61 Hz, providing high temporal resolution.

Segment Length: 23.6 seconds per segment, equivalent to 4,096 data points.

Labels: Clearly defined for each subset (A: healthy, eyes open; B: healthy, eyes closed; C: interictal, epileptogenic zone; D: interictal, opposite hemisphere; E: ictal), enabling precise model evaluation.

Preprocessing: The data is not pre-filtered, but a low-pass filter with a 40 Hz cutoff is recommended to remove high-frequency noise and artifacts, as suggested in the original documentation.

**Applications

The Bonn EEG Dataset is ideal for machine learning and signal processing tasks, including: - Developing algorithms for epileptic seizure detection and prediction. - Exploring feature extraction techniques, such as wavelet transforms, for EEG signal analysis. - Classifying brain states (healthy vs. epileptic, interictal vs. ictal). - Supporting research in neuroscience and medical diagnostics, particularly for epilepsy monitoring and treatment.

**Source

The dataset is publicly available from the University of Bonn and can be downloaded from the following link: University of Bonn EEG Dataset

The dataset is provided as five ZIP files, each containing 100 text files corresponding to the EEG segments for subsets A, B, C, D, and E.

**Citation

When using this dataset, researchers are required to cite the original publication: Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6), 061907. DOI: 10.1103/PhysRevE.64.061907.

**Additional Notes

The dataset is randomized, with no specific information provided about patients or electrode placements, ensuring simplicity and focus on signal characteristics.

The data is not hosted on Kaggle or Hugging Face but is accessible directly from the University of Bonn’s repository or mirrored sources.

Researchers may need to apply preprocessing steps, such as filtering or normalization, to optimize the data for machine learning tasks.

The dataset’s balanced structure and clear labels make it an excellent choice for a one-week machine learning project, particularly for tasks involving traditional algorithms like SVM, Random Forest, or Logistic Regression.

This dataset provides a robust foundation for learning signal processing, feature extraction, and machine learning techniques while addressing a real-world medical challenge in epilepsy detection.
i
EEG Signal Dataset
ieee-dataport.org
Updated Jun 11, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Kher (2020). EEG Signal Dataset [Dataset]. https://ieee-dataport.org/documents/eeg-signal-dataset
Explore at:
Dataset updated
Jun 11, 2020
Authors
Rahul Kher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PCA
u
EEG Datasets for Naturalistic Listening to "Alice in Wonderland" (Version 1)...
deepblue.lib.umich.edu
Updated Nov 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brennan, Jonathan R. (2018). EEG Datasets for Naturalistic Listening to "Alice in Wonderland" (Version 1) [Dataset]. http://doi.org/10.7302/Z29C6VNH
Explore at:
Unique identifier
https://doi.org/10.7302/Z29C6VNH
Dataset updated
Nov 20, 2018
Dataset provided by
Deep Blue Data
Authors
Brennan, Jonathan R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These files contain the raw data and processing parameters to go with the paper "Hierarchical structure guides rapid linguistic predictions during naturalistic listening" by Jonathan R. Brennan and John T. Hale. These files include the stimulus (wav files), raw data (matlab format for the Fieldtrip toolbox), data processing paramters (matlab), and variables used to align the stimuli with the EEG data and for the statistical analyses reported in the paper.
h
General-Disorders-EEG-Dataset-v1
huggingface.co
Updated Aug 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neurazum (2025). General-Disorders-EEG-Dataset-v1 [Dataset]. http://doi.org/10.57967/hf/3321
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/3321
Dataset updated
Aug 21, 2025
Dataset authored and provided by
Neurazum
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

Synthetic EEG data generated by the ‘bai’ model based on real data.

Features/Columns:

No: "Number" Sex: "Gender" Age: "Age of participants" EEG Date: "The date of the EEG" Education: "Education level" IQ: "IQ level of participants" Main Disorder: "General class definition of the disorder" Specific Disorder: "Specific class definition of the disorder"

Total Features/Columns: 1140

Content:

Obsessive Compulsive Disorder Bipolar Disorder Schizophrenia… See the full description on the dataset page: https://huggingface.co/datasets/Neurazum/General-Disorders-EEG-Dataset-v1.
EEG Alzheimer's Dataset
kaggle.com
Updated Sep 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2025). EEG Alzheimer's Dataset [Dataset]. https://www.kaggle.com/datasets/ucimachinelearning/eeg-alzheimers-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2025
Dataset provided by
Kaggle
Authors
UCI Machine Learning
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains 848,640 records with 17 columns, representing EEG (Electroencephalogram) signals recorded from multiple electrode positions on the scalp, along with a status label. The dataset is be related to the study of Alzheimer’s Disease (AD).

Features (16 continuous variables, float64): Each feature corresponds to the electrical activity recorded from standard EEG electrode placements based on the international 10-20 system:

Fp1, Fp2, F7, F3, Fz, F4, F8

T3, C3, Cz, C4, T4

T5, P3, Pz, P4

These channels measure brain activity in different cortical regions (frontal, temporal, central, and parietal lobes).

Target variable (1 categorical variable, int64):

status: Represents the condition or classification of the subject at the time of recording (e.g., patient vs. control, or stage of Alzheimer’s disease).

Size & Integrity:

Rows: 848,640 samples

Columns: 17 (16 EEG features + 1 status label)

Data types: 16 float features, 1 integer label

Missing values: None (clean dataset)

This dataset is suitable for machine learning and deep learning applications such as:

EEG signal classification (AD vs. healthy subjects)

Brain activity pattern recognition

Feature extraction and dimensionality reduction (e.g., PCA, wavelet transforms)

Time-series analysis of EEG recordings
EEG datasets of stroke patients
figshare.com
json
Updated Sep 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haijie Liu; Xiaodong Lv (2023). EEG datasets of stroke patients [Dataset]. http://doi.org/10.6084/m9.figshare.21679035.v5
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21679035.v5
Dataset updated
Sep 14, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Haijie Liu; Xiaodong Lv
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set consists of electroencephalography (EEG) data from 50 (Subject1 – Subject50) participants with acute ischemic stroke aged between 30 and 77 years. The participants included 39 male and 11 female. The time after stroke ranged from 1 days to 30 days. 22 participants had right hemisphere hemiplegia and 28 participants had left hemisphere hemiplegia. All participants were originally right-handed. Each of the participants sat in front of a computer screen with an arm resting on a pillow on their lap or on a table and they carried out the instructions given on the computer screen. At the trial start, a picture with text description which was circulated with left right hand, were presented for 2s. We asked the participants to focus their mind on the hand motor imagery which was instructed, at the same time, the video of ipsilateral hand movement is displayed on the computer screen and lasts for 4s. Next, take a 2s break.
c
Ultra high-density EEG recording of interictal migraine and controls:...
kilthub.cmu.edu
txt
Updated Jul 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alireza Chaman Zar; Sarah Haigh; Pulkit Grover; Marlene Behrmann (2020). Ultra high-density EEG recording of interictal migraine and controls: sensory and rest [Dataset]. http://doi.org/10.1184/R1/12636731
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/12636731
Dataset updated
Jul 21, 2020
Dataset provided by
Carnegie Mellon University
Authors
Alireza Chaman Zar; Sarah Haigh; Pulkit Grover; Marlene Behrmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We used a high-density electroencephalography (HD-EEG) system, with 128 customized electrode locations, to record from 17 individuals with migraine (12 female) in the interictal period, and 18 age- and gender-matched healthy control subjects, during visual (vertical grating pattern) and auditory (modulated tone) stimulation which varied in temporal frequency (4 and 6Hz), and during rest. This dataset includes the EEG raw data related to the paper entitled Chamanzar, Haigh, Grover, and Behrmann (2020), Abnormalities in cortical pattern of coherence in migraine detected using ultra high-density EEG. The link to our paper will be made available as soon as it is published online.
i
EEG Dataset
ieee-dataport.org
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keerthi Kumar K J (2025). EEG Dataset [Dataset]. https://ieee-dataport.org/documents/eeg-dataset
Explore at:
Dataset updated
Aug 10, 2025
Authors
Keerthi Kumar K J
Description
This project demonstrates a Brain-Computer Interface (BCI) simulation using real EEG signals to classify binary decisions (Yes/No). It is designed as an accessible prototype for researchers and students to understand and explore cognitive signal processing—without needing expensive hardware.
Data from: A multi-subject and multi-session EEG dataset for modelling human...
openneuro.org
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuning Xue; Bu Jin; Jie Jiang; Longteng Guo; Jin Zhou; Changyong Wang; Jing Liu (2025). A multi-subject and multi-session EEG dataset for modelling human visual object recognition [Dataset]. http://doi.org/10.18112/openneuro.ds005589.v1.0.3
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds005589.v1.0.3
Dataset updated
Jun 7, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Shuning Xue; Bu Jin; Jie Jiang; Longteng Guo; Jin Zhou; Changyong Wang; Jing Liu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

This multi-subject and multi-session EEG dataset for modelling human visual object recognition (MSS) contains:

122-channel EEG data collected on 32 participants during natural visual stimulation;

totally 100 sessions for 1.5 hours each;

each session consists of 4 RSVP runs and 4 low-speed presentation runs;

each participant completed between 1 to 5 sessions on different days, around one week apart.

More details about the dataset are described as follows.

Participants

32 participants were recruited from college students in Beijing, of which 4 were female, and 28 were male, with an age range of 21-33 years. 100 sessions were conducted. They were paid and gave written informed consent. The study was conducted under the approval of the ethical committee of the Institute of Automation at the Chinese Academy of Sciences, with the approval number: IA21-2410-020201.

Experimental Procedures

RSVP experiment: During the RSVP experiment, the participants were shown images at a rate of 5 Hz, and each run consisted of 2,000 trials. There were 20 image categories, with 100 images in each category, making up the 2,000 stimuli. The 100 images in each category were further divided into five image sequences, resulting in 100 image sequences per run. Each sequence was composed of 20 images from the same class, and the 100 sequences were presented in a pseudo-random order.

After every 50 sequences, there was a break for the participants to rest. Each rapid serial sequence lasted approximately 7.5 seconds, starting with a 750ms blank screen with a white fixation cross, followed by 20 or 21 images presented at 5 Hz with a 50% duty cycle. The sequence ended with another 750ms blank screen.

After the rapid serial sequence, there was a 2-second interval during which participants were instructed to blink and then report whether a special image appeared in the sequence using a keyboard. During each run, 20 sequences were randomly inserted with additional special images at random positions. The special images are logos for brain-computer interfaces.

Low-speed experiment: During the low-speed experiment, each run consisted of 100 trials, with 1 second per image for a slower paradigm. The 100 stimuli were presented in a pseudo-random order and included 20 image categories, each containing 5 images. A break was given to the participants after every 20 images for them to rest.

Each image was displayed for 1 second and was followed by 11 choice boxes (1 correct class box, 9 random class boxes, and 1 reject box). Participants were required to select the correct class of the displayed image using a mouse to increase their engagement. After the selection, a white fixation cross was displayed for 1 second in the centre of the screen to remind participants to pay attention to the upcoming task.

Stimuli

The stimuli are from two image databases, ImageNet and PASCAL. The final set consists of 10,000 images, with 500 images for each class.

Annotations

In the derivatives/annotations folder, there are additional information of MSS:

Videos of two paradigms.

Dataset_info: Main features of MSS.

Experiment_schedule: Schedule of each session.

Stimuli_source: Source categories of ImageNet and PASCAL.

Subject_info: Age and sex of participants.

Task_event: The meaning of eventID.

Preprocessing

The EEG signals were pre-processed using the MNE package, version 1.3.1, with Python 3.9.16. The data was sampled at a rate of 1,000 Hz with a bandpass filter applied between 0.1 and 100 Hz. A notch filter was used to remove 50 Hz power frequency. Epochs were created for each trial ranging from 0 to 500 ms relative to stimulus onset. No further preprocessing or artefact correction methods were applied in technical validation. However, researchers may want to consider widely used preprocessing steps such as baseline correction or eye movement correction. After the preprocessing, each session resulted in two matrices: RSVP EEG data matrix of shape (8,000 image conditions × 122 EEG channels × 125 EEG time points) and low-speed EEG data matrix of shape (400 image conditions × 122 EEG channels × 125 EEG time points).
b
Harvard Electroencephalography Database
bdsp.io
registry.opendata.aws
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahar Zafar; Tobias Loddenkemper; Jong Woo Lee; Andrew Cole; Daniel Goldenholz; Jurriaan Peters; Alice Lam; Edilberto Amorim; Catherine Chu; Sydney Cash; Valdery Moura Junior; Aditya Gupta; Manohar Ghanta; Marta Fernandes; Haoqi Sun; Jin Jing; M Brandon Westover (2025). Harvard Electroencephalography Database [Dataset]. http://doi.org/10.60508/k85b-fc87
Explore at:
Unique identifier
https://doi.org/10.60508/k85b-fc87
Dataset updated
Feb 10, 2025
Authors
Sahar Zafar; Tobias Loddenkemper; Jong Woo Lee; Andrew Cole; Daniel Goldenholz; Jurriaan Peters; Alice Lam; Edilberto Amorim; Catherine Chu; Sydney Cash; Valdery Moura Junior; Aditya Gupta; Manohar Ghanta; Marta Fernandes; Haoqi Sun; Jin Jing; M Brandon Westover
License
https://github.com/bdsp-core/bdsp-license-and-duahttps://github.com/bdsp-core/bdsp-license-and-dua
Description
The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University: Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH). The EEG data includes three types:

rEEG: "routine EEGs" recorded in the outpatient setting. EMU: recordings obtained in the inpatient setting, within the Epilepsy Monitoring Unit (EMU). ICU/LTM: recordings obtained from acutely and critically ill patients within the intensive care unit (ICU).
p
Auditory evoked potential EEG-Biometric dataset
physionet.org
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nibras Abo Alzahab; Angelo Di Iorio; Luca Apollonio; Muaaz Alshalak; Alessandro Gravina; Luca Antognoli; Marco Baldi; Lorenzo Scalise; Bilal Alchalabi (2021). Auditory evoked potential EEG-Biometric dataset [Dataset]. http://doi.org/10.13026/ps31-fc50
Explore at:
Unique identifier
https://doi.org/10.13026/ps31-fc50
Dataset updated
Dec 1, 2021
Authors
Nibras Abo Alzahab; Angelo Di Iorio; Luca Apollonio; Muaaz Alshalak; Alessandro Gravina; Luca Antognoli; Marco Baldi; Lorenzo Scalise; Bilal Alchalabi
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
This data set consists of over 240 two-minute EEG recordings obtained from 20 volunteers. Resting-state and auditory stimuli experiments are included in the data. The goal is to develop an EEG-based Biometric system.

The data includes resting-state EEG signals in both cases: eyes open and eyes closed. The auditory stimuli part consists of six experiments; Three with in-ear auditory stimuli and another three with bone-conducting auditory stimuli. The three stimuli for each case are a native song, a non-native song, and neutral music.
i
Preprocessed CHB-MIT Scalp EEG Database
ieee-dataport.org
Updated Dec 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrs Deepa .B (2024). Preprocessed CHB-MIT Scalp EEG Database [Dataset]. https://ieee-dataport.org/open-access/preprocessed-chb-mit-scalp-eeg-database
Explore at:
Dataset updated
Dec 24, 2024
Authors
Mrs Deepa .B
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format.
The Phantom EEG Dataset
zenodo.org
bin, tar
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2024). The Phantom EEG Dataset [Dataset]. http://doi.org/10.5281/zenodo.13341214
Explore at:
bin, tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13341214
Dataset updated
Oct 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
When you use this dataset, please cite this paper. More information about this dataset could also be found in this paper.

Xu, X., Wang, B., Xiao, B., Niu, Y., Wang, Y., Wu, X., & Chen, J. (2024). Beware of Overestimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals. arXiv preprint arXiv:2405.17024.

1 Metadata

Brief introduction

The present work aims to demonstrate that temporal autocorrelations (TA) significantly impacts various BCI tasks even in conditions without neural activity. We used the watermelon as the phantom head and found that we could get the pitfall of overestimated decoding performance if continuous EEG data with the same class label were split into training and test sets. More details can be found in Motivation.

As watermelons cannot perform any experimental tasks, we can reorganize it to the format of various actual EEG dataset without the need to collect EEG data as previous work did (examples in Domain Studied).

Measurement devices

Manufacturers: NeuroScan SynAmps2 system (Compumedics Limited, Victoria, Australia)

Configuration: 64-channel Ag/AgCl electrode cap with a 10/20 layout

Species

Watermelons. Ten watermelons served as phantom heads.

Domain Studied

Overestimated Decoding Performance in EEG decoding.

Following BCI datasets in various BCI tasks have been reorganized using the Phantom EEG Dataset. The pitfall has been found in four of five tasks.

- CVPR dataset [1] for image decoding task.

- DEAP dataset [2] for emotion recognition task.

- KUL dataset [3] for auditory spatial attention decoding task.

- BCIIV2a dataset [4] for motor imagery task (the pitfalls were absent due to the use of rapid-design paradigm during EEG recording).

- SIENA dataset [5] for epilepsy detection task.

Tasks Completed

Resting State but you could reorganize it to any task in BCI.

Dataset Name

The Phantom EEG Dataset

Dataset license

Creative Commons Attribution 4.0 International

Code

Your could get the code to read the data files (.cnt or .set) in the “code” folder.

To run the codes, you should install the mne and numpy package. You could install via pip

pip install mne==1.3.1

pip install numpy

Then, you could use “BID2WMCVPR.py” to convert the BID dataset to the WM-CVPR dataset. You could also use “CNTK2WMCVPR.py” to convert the CNT dataset to the WM-CVPR dataset.

The codes to reorganize other datasets other than CVPR [1] will be released on github after reviewing.

Data information

- CNT: the raw data.

Each Subject (S*.cnt) contains the following information:

EEG.data: EEG data (samples X channels)

EEG.srate: Sampling frequency of the saved data

EEG.chanlocs : channel numbers (1 to 68, ‘EKG’ ‘EMG’ 'VEO' 'HEO' were not recorded)

- BIDS: an extension to the brain imaging data structure for electroencephalography. BIDS primarily addresses the heterogeneity of data organization by following the FAIR principles [6].

Each Subject (sub-S*/eeg/) contains the following information:

sub-S*_task-RestingState_channels.tsv: channel numbers (1 to 68, ‘EKG’ ‘EMG’ 'VEO' 'HEO' were not recorded)

sub-S*_task-RestingState_eeg.json: Some information about the dataset.

sub-S*_task-RestingState_eeg.set: EEG data (samples X channels)

sub-S*_task-RestingState_events.tsv: the event during recording. We organized events using block-design and rapid-event-design. However, it is important to note that this does not need to be considered in any subsequent data reorganization, as watermelons cannot follow any experimental instructions.

- code: more information on Code.

- readme.md: the information about the dataset.

Recordings

An additional electrode was placed on the lower part of the watermelon as the physiological reference, and the forehead served as the ground site. The inter-electrode impedances were maintained under 20 kOhm. Data were recorded at a sampling rate of 1000 Hz. EEG recordings for each watermelon lasted for more than 1 hour to ensure sufficient data for the decoding task.

Citation and more information

Citation will be updated after the review period is completed.

We will provide more information about this dataset (e.g. the units of the captured data) once our work is accepted. This is because our work is currently under review, and we are not allowed to disclose more information according to the relevant requirements.

All metadata will be provided as a backup on Github and will be available after the review period is completed.

2 Motivation

Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, epilepsy detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were doubted by some researchers, and they proposed that such decoding accuracy was overestimated due to the inherent temporal autocorrelations (TA) of EEG signals [7]–[9].

However, the coupling between the stimulus-driven neural responses and the EEG temporal autocorrelations makes it difficult to confirm whether this overestimation exists in truth. Some researchers also argue that the effect of TA in EEG data on decoding is negligible and that it becomes a significant problem only under specific experimental designs in which subjects do not have enough resting time [10], [11].

Due to a lack of problem formulation previous studies [7]–[9] only proposed that block-design should not be used to avoid the pitfall. However, the impact of TA could be avoided only when the trial of EEG was not further segmented into several samples. Otherwise, the overfitting or pitfall would still occur. In contrast, when the correct data splitting strategy was used (e.g. separating training and test data in time), the pitfall could also be avoided even when block-design was used.

In our framework, we proposed the concept of "domain" to represent the EEG patterns resulting from TA and then used phantom EEG to remove stimulus-driven neural responses for verification. The results confirmed that the TA, always existing in the EEG data, added unique domain features to a continuous segment of EEG. The specific finding is that when the segment of EEG data with the same class label is split into multiple samples, the classifier will associate the sample's class label with the domain features, interfering with the learning of class-related features. This leads to an overestimation of decoding performance for test samples from the domains seen during training, and results in poor accuracy for test samples from unseen domains (as in real-world applications).

Importantly, our work suggests that the key to reducing the impact of EEG TA on BCI decoding is to decouple class-related features from domain features in the actual EEG dataset. Our proposed unified framework serves as a reminder to BCI researchers of the impact of TA on their specific BCI tasks and is intended to guide them in selecting the appropriate experimental design, splitting strategy and model construction.

3 The rationality for using watermelon as the phantom head

We must point out that the "phantom EEG" indeed does not contain any "EEG" but records only noise, a watermelon is not a brain and does not generate any electrical signals. Therefore, the recorded electrical noises, even when amplified using equipment typically used for EEG, do not constitute EEG data when considering the definition of EEG. This is why previous researchers called it "phantom EEG". Some researchers may therefore think that it is questionable to use watermelon to get the phantom EEG.

However, the usage of the phantom head allows researchers to evaluate the performance of neural-recording equipment and proposed algorithms without the effects of neural activity variability, artifacts, and potential ethical issues. Phantom heads used in previous studies include digital models [12]–[14], real human skulls [15]–[17], artificial physical phantoms [18]–[24] and watermelons [25]–[40]. Due to their similar conductivity to human tissue, similar size and shape to the human head, and ease of acquisition, watermelons are widely used as "phantom heads".

Most works tried to use watermelon as a phantom head and found that the results analyzed using the neural signals from human subjects could not be obtained when using the phantom head, thus proving that the achieved results were indeed caused by neural signals. For example, Mutanen et.al [35] proposed that “the fact that the phantom head stimulation did not evoke similar biphasic artifacts excludes the possibility that residual induced artifacts, with the current TMS-compatible EEG system, could explain these components”.

Our work differs significantly from most previous works. It is firstly found in our work that the phantom EEG exhibits the effect of TA on BCI decoding even when only noise was recorded, indicating the inherent existence of TA in the EEG data. The conclusion we hope to draw is that some current works may not truly use stimulus-driven neural
Data from: A Resting-state EEG Dataset for Sleep Deprivation
openneuro.org
Updated Apr 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chuqin Xiang; Xinrui Fan; Duo Bai; Ke Lv; Xu Lei (2025). A Resting-state EEG Dataset for Sleep Deprivation [Dataset]. http://doi.org/10.18112/openneuro.ds004902.v1.0.8
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds004902.v1.0.8
Dataset updated
Apr 27, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Chuqin Xiang; Xinrui Fan; Duo Bai; Ke Lv; Xu Lei
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
General information

The dataset provides resting-state EEG data (eyes open,partially eyes closed) from 71 participants who underwent two experiments involving normal sleep (NS---session1) and sleep deprivation(SD---session2) .The dataset also provides information on participants' sleepiness and mood states. （Please note here Session 1 (NS) and Session 2 (SD) is not the time order, the time order is counterbalanced across participants and is listed in metadata.）

Dataset

Presentation

The data collection was initiated in March 2019 and was terminated in December 2020. The detailed description of the dataset is currently under working by Chuqin Xiang,Xinrui Fan,Duo Bai,Ke Lv and Xu Lei, and will submit to Scientific Data for publication.

EEG acquisition

EEG system (Brain Products GmbH, Steing- rabenstr, Germany, 61 electrodes)

Sampling frequency: 500Hz

Impedances were kept below 5k

Contact

* If you have any questions or comments, please contact: * Xu Lei: xlei@swu.edu.cn

Article

Xiang, C., Fan, X., Bai, D. et al. A resting-state EEG dataset for sleep deprivation. Sci Data 11, 427 (2024). https://doi.org/10.1038/s41597-024-03268-2
EEG and audio dataset for auditory attention decoding
zenodo.org
bin, zip
Updated Jan 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Søren A. Fuglsang; Søren A. Fuglsang; Daniel D.E. Wong; Daniel D.E. Wong; Jens Hjortkjær; Jens Hjortkjær (2020). EEG and audio dataset for auditory attention decoding [Dataset]. http://doi.org/10.5281/zenodo.1199011
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1199011
Dataset updated
Jan 31, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Søren A. Fuglsang; Søren A. Fuglsang; Daniel D.E. Wong; Daniel D.E. Wong; Jens Hjortkjær; Jens Hjortkjær
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset contains EEG recordings from 18 subjects listening to one of two competing speech audio streams. Continuous speech in trials of ~50 sec. was presented to normal hearing listeners in simulated rooms with different degrees of reverberation. Subjects were asked to attend one of two spatially separated speakers (one male, one female) and ignore the other. Repeated trials with presentation of a single talker were also recorded. The data were recorded in a double-walled soundproof booth at the Technical University of Denmark (DTU) using a 64-channel Biosemi system and digitized at a sampling rate of 512 Hz. Full details can be found in:

Søren A. Fuglsang, Torsten Dau & Jens Hjortkjær (2017): Noise-robust cortical tracking of attended speech in real-life environments. NeuroImage, 156, 435-444

and

Daniel D.E. Wong, Søren A. Fuglsang, Jens Hjortkjær, Enea Ceolini, Malcolm Slaney & Alain de Cheveigné: A Comparison of Temporal Response Function Estimation Methods for Auditory Attention Decoding. Frontiers in Neuroscience, https://doi.org/10.3389/fnins.2018.00531

The data is organized in format of the publicly available COCOHA Matlab Toolbox. The preproc_script.m demonstrates how to import and align the EEG and audio data. The script also demonstrates some EEG preprocessing steps as used the Wong et al. paper above. The AUDIO.zip contains wav-files with the speech audio used in the experiment. The EEG.zip contains MAT-files with the EEG/EOG data for each subject. The EEG/EOG data are found in data.eeg with the following channels:

channels 1-64: scalp EEG electrodes

channel 65: right mastoid electrode

channel 66: left mastoid electrode

channel 67: vertical EOG below right eye

channel 68: horizontal EOG right eye

channel 69: vertical EOG above right eye

channel 70: vertical EOG below left eye

channel 71: horizontal EOG left eye

channel 72: vertical EOG above left eye

The expinfo table contains information about experimental conditions, including what what speaker the listener was attending to in different trials. The expinfo table contains the following information:

attend_mf: attended speaker (1=male, 2=female)

attend_lr: spatial position of the attended speaker (1=left, 2=right)

acoustic_condition: type of acoustic room (1= anechoic, 2= mild reverberation, 3= high reverberation, see Fuglsang et al. for details)

n_speakers: number of speakers presented (1 or 2)

wavfile_male: name of presented audio wav-file for the male speaker

wavfile_female: name of presented audio wav-file for the female speaker (if any)

trigger: trigger event value for each trial also found in data.event.eeg.value

DATA_preproc.zip contains the preprocessed EEG and audio data as output from preproc_script.m.

The dataset was created within the COCOHA Project: Cognitive Control of a Hearing Aid
EEG dataset for the analysis of age-related changes in motor-related...
figshare.com
png
Updated Nov 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Frolov; Elena Pitsik; Vadim V. Grubov; Anton R. Kiselev; Vladimir Maksimenko; Alexander E. Hramov (2020). EEG dataset for the analysis of age-related changes in motor-related cortical activity during a series of fine motor tasks performance [Dataset]. http://doi.org/10.6084/m9.figshare.12301181.v2
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12301181.v2
Dataset updated
Nov 19, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Nikita Frolov; Elena Pitsik; Vadim V. Grubov; Anton R. Kiselev; Vladimir Maksimenko; Alexander E. Hramov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).The active phase of the experimental session included sequential execution of 60 fine motor tasks - squeezing a hand into a fist after the first audio command and holding it until the second audio command (30 repetitions per hand) (see Fig.1). Duration of the audio command determined type of the motor action to be executed: 0.25s for left hand (LH) movement and 0.75s for right rand (RH) movement. The time interval between two audio signals was selected randomly in the range 4-5s for each trial. The sequence of motor tasks was randomized and the pause between tasks was also chosen randomly in the range 6-8s to exclude possible training or motor-preparation effects caused by the sequential execution of the same tasks.Acquired EEG signals were then processed via preprocessing tools implemented in MNE Python package. Specifically, raw EEG signals were filtered by a Butterworth 5th order filter in the range 1-100 Hz, and by 50Hz Notch filter. Further, Independent Component Analysis (ICA) was applied to remove ocular and cardiac artifacts. Artifact-free EEG recordings were then segmented into 60 epochs according to the experimental protocol. Each epoch was 14s long, including 3s of baseline and 11s of motor-related brain activity, and time-locked to the first audio command indicating the start of motor execution. After visual inspection epochs that still contained artifacts were rejected. Finally, 15 epochs per movement type were stored for each subject.Individual epochs for each subject are stored in the attached MNE .fif files. Prefix EA or YA in the name of the file identifies the age group, which subject belongs to. Postfix LH or RH in the name of the file indicates the type of motor tasks.EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).
s
EEG Data for "Electrophysiological signatures of brain aging in autism...
orda.shef.ac.uk
bin
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Milne (2023). EEG Data for "Electrophysiological signatures of brain aging in autism spectrum disorder" [Dataset]. http://doi.org/10.15131/shef.data.16840351.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.16840351.v1
Dataset updated
May 30, 2023
Dataset provided by
The University of Sheffield
Authors
Elizabeth Milne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data is linked to the publication "Electrophysiological signatures of brain aging in autism spectrum disorder" by Dickinson, Jeste and Milne, in which it is referenced as Dataset 1.EEG data were acquired via Biosemi Active two EEG system. The original recordings have been converted to .set and .fdt files via EEGLAB as uploaded here. There is a .fdt and a .set file for each recording, the .fdt file contains the data, the .set file contains information about the parameters of the recording (see https://eeglab.org/tutorials/ for further information). The files can be opened within EEGLAB software.The data were acquired from 28 individuals with a diagnosis of an autism spectrum condition and 28 neurotypical controls aged between 18 and 68 years. The paradigm that generated the data was a 2.5 minute (150 seconds) period of eyes closed resting.Ethical approval for data collection and data sharing was given by the Health Research Authority [IRAS ID = 212171].Only data from participants who provided signed consent for data sharing were included in this work and uploaded here.
Seizure Epilepcy CHB MIT EEG dataset pediatric
kaggle.com
zip
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Parikh (2023). Seizure Epilepcy CHB MIT EEG dataset pediatric [Dataset]. https://www.kaggle.com/datasets/abhishekinnvonix/seizure-epilepcy-chb-mit-eeg-dataset-pediatric
Explore at:
zip(25296815967 bytes)Available download formats
Dataset updated
Jul 1, 2023
Authors
Abhishek Parikh
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Recordings, grouped into 23 cases, were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages 1.5–19). (Case chb21 was obtained 1.5 years after case chb01, from the same female subject.) The file SUBJECT-INFO contains the gender and age of each subject. (Case chb24 was added to this collection in December 2010, and is not currently included in SUBJECT-INFO.)

Each case (chb01, chb02, etc.) contains between 9 and 42 continuous .edf files from a single subject. Hardware limitations resulted in gaps between consecutively-numbered .edf files, during which the signals were not recorded; in most cases, the gaps are 10 seconds or less, but occasionally there are much longer gaps. In order to protect the privacy of the subjects, all protected health information (PHI) in the original .edf files has been replaced with surrogate information in the files provided here. Dates in the original .edf files have been replaced by surrogate dates, but the time relationships between the individual files belonging to each case have been preserved. In most cases, the .edf files contain exactly one hour of digitized EEG signals, although those belonging to case chb10 are two hours long, and those belonging to cases chb04, chb06, chb07, chb09, and chb23 are four hours long; occasionally, files in which seizures are recorded are shorter.

All signals were sampled at 256 samples per second with 16-bit resolution. Most files contain 23 EEG signals (24 or 26 in a few cases). The International 10-20 system of EEG electrode positions and nomenclature was used for these recordings. In a few records, other signals are also recorded, such as an ECG signal in the last 36 files belonging to case chb04 and a vagal nerve stimulus (VNS) signal in the last 18 files belonging to case chb09. In some cases, up to 5 “dummy” signals (named "-") were interspersed among the EEG signals to obtain an easy-to-read display format; these dummy signals can be ignored.

The file RECORDS contains a list of all 664 .edf files included in this collection, and the file RECORDS-WITH-SEIZURES lists the 129 of those files that contain one or more seizures. In all, these records include 198 seizures (182 in the original set of 23 cases); the beginning ([) and end (]) of each seizure is annotated in the .seizure annotation files that accompany each of the files listed in RECORDS-WITH-SEIZURES. In addition, the files named chbnn-summary.txt contain information about the montage used for each recording, and the elapsed time in seconds from the beginning of each .edf file to the beginning and end of each seizure contained in it.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rahul Kher (2020). EEG Signal Dataset [Dataset]. https://ieee-dataport.org/documents/eeg-signal-dataset

EEG Signal Dataset

Explore at:

Dataset updated

Jun 11, 2020

Authors

Rahul Kher

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

PCA

Clear search

Close search

Google apps

Main menu

EEG Dataset for ADHD

CHB-MIT Scalp EEG Database

Data from: EEG-Dataset

EEG Signal Dataset

EEG Datasets for Naturalistic Listening to "Alice in Wonderland" (Version 1)...

General-Disorders-EEG-Dataset-v1

EEG Alzheimer's Dataset

EEG datasets of stroke patients

Ultra high-density EEG recording of interictal migraine and controls:...

EEG Dataset

Data from: A multi-subject and multi-session EEG dataset for modelling human...

Overview

Participants

Experimental Procedures

Stimuli

Annotations

Preprocessing

Harvard Electroencephalography Database

Auditory evoked potential EEG-Biometric dataset

Preprocessed CHB-MIT Scalp EEG Database

The Phantom EEG Dataset

1 Metadata

Brief introduction

Measurement devices

Species

Domain Studied

Tasks Completed

Dataset Name

Dataset license

Code

Data information

Recordings

Citation and more information

2 Motivation

3 The rationality for using watermelon as the phantom head

Data from: A Resting-state EEG Dataset for Sleep Deprivation

General information

Dataset

Presentation

EEG acquisition

Contact

Article

EEG and audio dataset for auditory attention decoding

EEG dataset for the analysis of age-related changes in motor-related...

EEG Data for "Electrophysiological signatures of brain aging in autism...

Seizure Epilepcy CHB MIT EEG dataset pediatric

EEG Signal DatasetSee More Versions

EEG Signal Dataset