Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
**Overview:
The Bonn EEG Dataset is a widely recognized dataset in the field of biomedical signal processing and machine learning, specifically designed for research in epilepsy detection and EEG signal analysis. It contains electroencephalogram (EEG) recordings from both healthy individuals and patients with epilepsy, making it suitable for tasks such as seizure detection and classification of brain activity states. The dataset is structured into five distinct subsets (labeled A, B, C, D, and E), each comprising 100 single-channel EEG segments, resulting in a total of 500 segments. Each segment represents 23.6 seconds of EEG data, sampled at a frequency of 173.61 Hz, yielding 4,096 data points per segment, stored in ASCII format as text files.
****Structure and Label:
**Key Characteristics
**Applications
The Bonn EEG Dataset is ideal for machine learning and signal processing tasks, including: - Developing algorithms for epileptic seizure detection and prediction. - Exploring feature extraction techniques, such as wavelet transforms, for EEG signal analysis. - Classifying brain states (healthy vs. epileptic, interictal vs. ictal). - Supporting research in neuroscience and medical diagnostics, particularly for epilepsy monitoring and treatment.
**Source
**Citation
When using this dataset, researchers are required to cite the original publication: Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6), 061907. DOI: 10.1103/PhysRevE.64.061907.
**Additional Notes
The dataset is randomized, with no specific information provided about patients or electrode placements, ensuring simplicity and focus on signal characteristics.
The data is not hosted on Kaggle or Hugging Face but is accessible directly from the University of Bonn’s repository or mirrored sources.
Researchers may need to apply preprocessing steps, such as filtering or normalization, to optimize the data for machine learning tasks.
The dataset’s balanced structure and clear labels make it an excellent choice for a one-week machine learning project, particularly for tasks involving traditional algorithms like SVM, Random Forest, or Logistic Regression.
This dataset provides a robust foundation for learning signal processing, feature extraction, and machine learning techniques while addressing a real-world medical challenge in epilepsy detection.
Facebook
TwitterThis is the Dataset Collected by Shahed Univeristy Released in IEEE.
the Columns are: Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2, Class, ID
the first 19 are channel names.
Class: ADHD/Control
ID: Patient ID
Participants were 61 children with ADHD and 60 healthy controls (boys and girls, ages 7-12). The ADHD children were diagnosed by an experienced psychiatrist to DSM-IV criteria, and have taken Ritalin for up to 6 months. None of the children in the control group had a history of psychiatric disorders, epilepsy, or any report of high-risk behaviors.
EEG recording was performed based on 10-20 standard by 19 channels (Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2) at 128 Hz sampling frequency. The A1 and A2 electrodes were the references located on earlobes.
Since one of the deficits in ADHD children is visual attention, the EEG recording protocol was based on visual attention tasks. In the task, a set of pictures of cartoon characters was shown to the children and they were asked to count the characters. The number of characters in each image was randomly selected between 5 and 16, and the size of the pictures was large enough to be easily visible and countable by children. To have a continuous stimulus during the signal recording, each image was displayed immediately and uninterrupted after the child’s response. Thus, the duration of EEG recording throughout this cognitive visual task was dependent on the child’s performance (i.e. response speed).
Citation Author(s): Ali Motie Nasrabadi Armin Allahverdy Mehdi Samavati Mohammad Reza Mohammadi
DOI: 10.21227/rzfh-zn36
License: Creative Commons Attribution
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This database, collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure medication in order to characterize their seizures and assess their candidacy for surgical intervention. The recordings are grouped into 23 cases and were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages 1.5–19).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PCA
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this work, an experimental methodology for the acquisition of EEG signals from volunteer subjects was developed. The volunteers are colleagues and research fellows from ESPOL and patients of the Hospital Luis Vernaza for participating as test subjects. This dataset consists of over 8680 four-second EEG recordings obtained from 60 volunteers.
Equipment: We use the OpenBCI Cyton + Daisy (www.openbci.com) Biosensing Board for EEG signal recording. The OpenBCI equipment has an active bandpass filter in the 5 to 50Hz range, additionally, a notch filter at 60Hz. This non-invasive device operates within a sampling frequency of 125Hz and has 16 dry electrodes with two ground references, distributed in the international 10-10 system. All 16 EEG electrodes were recorded in monopolar configuration, in which the potential of each electrode is compared with a neutral electrode located in both lobes of the ears.
Data Description: Each recording was recorded in a CSV file format, the values of each electrode are in microvolts (uV). In total, each subject generates 124 CSV files in each experiment (run). Some subjects perform two experiments, one executing the motor tasks and the other imagining doing them. The tasks are described below: - Recording a Baseline with Eyes Open (BEO) without any task command: only once at the beginning of each run. - Closing Left Hand (CLH): five times per run. - Closing Right Hand (CRH): five times per run. - Dorsal flexion of Left Foot (DLF): five times per run. - Plantar flexion of Left Foot (PLF): five times per run. - Dorsal flexion of Right Foot (DRF): five times per run. - Plantar flexion of Right Foot (PRF): five times per run. - Resting in between tasks (Rest): after each task, in total 31 files.
CSV file encoding: - Subject ID: Assigned ID to each test subject in order to hide their identity. e.g. Sx, such that x can be any number from 1 to 60. - Repetition number: The participants may perform more than one repetition of the experiment. ExaOnly one subject volunteered to perform up to 4 repetitions. e.g. Rx, such that x can be any repetition number between 1 and 4. - Motor or Motor Imagery Activity: For each repetition, participants are asked to perform first the motor tasks (M) and then the motor imagery tasks (I). & Mx and Ix, where x is the Label of the task performed. - Label: Identifier of the performed task, where 1 is for BEO, 2 for CLH, 3 for CRH, 4 for DLF, 5 for PLF, 6 for DRF, 7 for PRF and finally 8 for Rest. e.g. M2 represents the CLH Motor task. - Task repetition number: Ordinal number of the task repetition. Tasks are presented randomly up to 5 times per run. e.g. S24R1I6_5 is from subject 24, repetition 1, DRF Imagery task. Finally, the number five at the end represents the fifth task repetition in the record.
Additionally, this dataset includes the file "Test_Subject_Annotations.csv", with the demographic information of each of the 60 volunteers, respecting the confidentiality of each individual.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PCA
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Siena Sleep EEG dataset contains multi-channel EEG recordings collected during sleep, specifically curated for epilepsy detection and sleep stage analysis. Electroencephalography (EEG) is one of the most reliable methods for studying brain activity during sleep, and it plays a crucial role in diagnosing neurological disorders such as epilepsy.
The dataset is formatted as a large-scale time-series table where each row represents a sampled time point, and each column corresponds to an EEG electrode channel. An additional diagnosis label column indicates whether the signal segment belongs to a healthy control or an epilepsy patient.
Dataset Structure
Number of Records: 944,640 samples
Number of Features: 20 EEG channels + 1 diagnosis label
File Format: CSV
Memory Size: ~150 MB
Columns
EEG Channels (20):
Fp1, F3, C3, P3, O1, F7, T3, T5, Fc1, Fc5, Cp1, Cp5, F9, Fz, Cz, Pz, Pf2, F4, C4, P4
These correspond to standard 10–20 EEG electrode placements, covering frontal, central, parietal, occipital, and temporal lobes.
diagnosis: 0 → Non-epileptic (Healthy subject)
1 → Sleep Stage Epileptic case
Facebook
TwitterTHINGS-EEG
This dataset is a processed version of THINGS-EEG, derived from the paper Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior (CVPR 2025). In this version, the EEG data is stored in float16 format, reducing the storage size by half. The original official dataset can be accessed from the OSF repository. Original official dataset:
A large and rich EEG dataset for modeling human visual object recognition [THINGS-EEG]
Citation… See the full description on the dataset page: https://huggingface.co/datasets/Haitao999/things-eeg.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Introduction: This dataset consists of the MEEG (sMRI+MEG+EEG) portion of the multi-subject, multi-modal face processing dataset (ds000117). This dataset was originally acquired and shared by Daniel Wakeman and Richard Henson (https://pubmed.ncbi.nlm.nih.gov/25977808/). The MEG and EEG data were simultaneously recorded; the sMRI scans were preserved to support M/EEG source localization. Following event log augmentation, reorganization, and HED (v8.0.0) annotation, the EEG data have been repackaged in EEGLAB format.
Overview of the experiment: Eighteen participants completed two recording sessions spaced three months apart – one session recorded fMRI and the other simultaneously recorded MEG and EEG data. During each session, participants performed the same simple perceptual task, responding to presented photographs of famous, unfamiliar, and scrambled faces by pressing one of two keyboard keys to indicate a subjective yes or no decision as to the relative spatial symmetry of the viewed face. Famous faces were feature-matched to unfamiliar faces; half the faces were female. The two sessions (MEEG, fMRI) had different organizations of event timing and presentation because of technological requirements of the respective imaging modalities. Each individual face was presented twice during the session. For half of the presented faces, the second presentation followed immediately after the first. For the other half, the second presentation was delayed by 5-15 face presentations.
Preprocessing: Multi-subject, multi-modal (sMRI+EEG) neuroimaging dataset on face processing. Original data described at https://www.nature.com/articles/sdata20151 This is repackaged version of the EEG data in EEGLAB format. The data has gone through minimal preprocessing including (see wh_extracteeg_BIDS.m): - Ignoring fMRI and MEG data (sMRI preserved for EEG source localization) - Extracting EEG channels out of the MEG/EEG fif data - Adding fiducials - Renaming EOG and EKG channels - Extracting events from event channel - Removing spurious events 5, 6, 7, 13, 14, 15, 17, 18 and 19 - Removing spurious event 24 for subject 3 run 4 - Renaming events taking into account button assigned to each subject - Correcting event latencies (events have a shift of 34 ms) - Resampling data to 250 Hz (this is a step that is done because this dataset is used as tutorial for EEGLAB and need to be lightweight) - Merging run 1 to 6 - Removing event fields urevent and duration - Filling up empty fields for events boundary and stim_file. - Saving as EEGLAB .set format
Original and related datasets This data is a mapping of the original openfmri dataset ds000117 on OpenfMRI, which is no longer available (although a copy is available in the sourcedata folder of the ds003645 repository). The ds000117 dataset on OpenNeuro contains only 16 subjects. The original OpenfMRI dataset is described at the bottom of this README file https://openneuro.org/datasets/ds000117/versions/1.0.4/file-display/README along with the correspondance with the 16 subjects in ds000117. Note that sub-001 data on OpenfMRI was corrupted so it is not included here.
The openneuro dataset ds003645 is similar to this one but also contains MEG data and HED events. Also, it does not have the different runs merged.
Import warning Make sure to import the channel locations from the BIDS electrodes.tsv files. The EEGLAB .set files also contain channel locations, although they differ for subjects 8 and 14 because the .set version is wrong and rotated by 90 degrees. When using the EEGLAB EEG BIDS plugin, the default behavior is to import channel locations from BIDS.
Data curators: Ramon Martinez, Dung Truong, Scott Makeig, Arnaud Delorme (UCSD, La Jolla, CA, USA)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
There is a growing imperative to understand the neurophysiological impact of our rapidly changing and diverse technological, social, chemical, and physical environments. To untangle the multidimensional and interacting effects requires data at scale across diverse populations, taking measurement out of a controlled lab environment and into the field. Electroencephalography (EEG), which has correlates with various environmental factors as well as cognitive and mental health outcomes, has the advantage of both portability and cost-effectiveness for this purpose. However, with numerous field researchers spread across diverse locations, data quality issues and researcher idle time due to insufficient participants can quickly become unmanageable and expensive problems. In programs we have established in India and Tanzania, we demonstrate that with appropriate training, structured teams, and daily automated analysis and feedback on data quality, nonspecialists can reliably collect EEG data alongside various survey and assessments with consistently high throughput and quality. Over a 30 week period, research teams were able to maintain an average of 25.6 participants per week, collecting data from a diverse sample of 7,933 participants ranging from Hadzabe hunter-gatherers to office workers. Furthermore, data quality, computed on the first 5,831 records using two common methods, PREP and FASTER, was comparable to benchmark datasets from controlled lab conditions. Altogether this resulted in a cost per participant of under $50, a fraction of the cost typical of such data collection, opening up the possibility for large-scale programs particularly in low- and middle-income countries.
A subset of large-scale EEG recordings from India and Tanzania are uploaded here along with metadata like age, mental health quotient (MHQ) score, income and sex. This BIDS dataset was generated using MNE-BIDS from EDF source files.
Vianney JM, Swaminathan S, Newson JJ, Parameshwaran D, Subramaniyam NP, Roy SS, Machunda R, Sapuli A, Pramanik S, Kumar JV, Tiwari P. EEG Data Quality in Large-Scale Field Studies in India and Tanzania. Eneuro. 2025 Jul 1;12(7).
Newson JJ, Pastukh V, Thiagarajan TC. Assessment of population well-being with the Mental Health Quotient: validation study. JMIR Mental Health. 2022 Apr 20;9(4):e34105.
Appelhoff, S., Sanderson, M., Brooks, T., Vliet, M., Quentin, R., Holdgraf, C., Chaumon, M., Mikulan, E., Tavabi, K., Höchenberger, R., Welke, D., Brunner, C., Rockhill, A., Larson, E., Gramfort, A. and Jas, M. (2019). MNE-BIDS: Organizing electrophysiological data into the BIDS format and facilitating their analysis. Journal of Open Source Software 4: (1896).https://doi.org/10.21105/joss.01896
Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6, 103.https://doi.org/10.1038/s41597-019-0104-8
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article presents an EEG dataset collected using the EMOTIV EEG 5-Channel Sensor kit during four different types of stimulation: Complex mathematical problem solving, Trier mental challenge test, Stroop colour word test, and Horror video stimulation, Listening to relaxing music. The dataset consists of EEG recordings from 22 subjects for Complex mathematical problem solving, 24 for Trier mental challenge test, 24 for Stroop colour word test, 22 for horror video stimulation, and 20 for relaxed state recordings. The data was collected in order to investigate the neural correlates of stress and to develop models for stress detection based on EEG data. The dataset presented in this article can be used for various applications, including stress management, healthcare, and workplace safety. The dataset provides a valuable resource for researchers and developers working on stress detection using EEG data, while the stress detection method provides a useful tool for evaluating the effectiveness of different stress detection models. Overall, this article contributes to the growing body of research on stress detection and management using EEG data and provides a useful resource for researchers and practitioners working in this field.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We used a high-density electroencephalography (HD-EEG) system, with 128 customized electrode locations, to record from 17 individuals with migraine (12 female) in the interictal period, and 18 age- and gender-matched healthy control subjects, during visual (vertical grating pattern) and auditory (modulated tone) stimulation which varied in temporal frequency (4 and 6Hz), and during rest. This dataset includes the EEG raw data related to the paper entitled Chamanzar, Haigh, Grover, and Behrmann (2020), Abnormalities in cortical pattern of coherence in migraine detected using ultra high-density EEG. The link to our paper will be made available as soon as it is published online.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains EEG recordings from 18 subjects listening to one of two competing speech audio streams. Continuous speech in trials of ~50 sec. was presented to normal hearing listeners in simulated rooms with different degrees of reverberation. Subjects were asked to attend one of two spatially separated speakers (one male, one female) and ignore the other. Repeated trials with presentation of a single talker were also recorded. The data were recorded in a double-walled soundproof booth at the Technical University of Denmark (DTU) using a 64-channel Biosemi system and digitized at a sampling rate of 512 Hz. Full details can be found in:
and
The data is organized in format of the publicly available COCOHA Matlab Toolbox. The preproc_script.m demonstrates how to import and align the EEG and audio data. The script also demonstrates some EEG preprocessing steps as used the Wong et al. paper above. The AUDIO.zip contains wav-files with the speech audio used in the experiment. The EEG.zip contains MAT-files with the EEG/EOG data for each subject. The EEG/EOG data are found in data.eeg with the following channels:
The expinfo table contains information about experimental conditions, including what what speaker the listener was attending to in different trials. The expinfo table contains the following information:
DATA_preproc.zip contains the preprocessed EEG and audio data as output from preproc_script.m.
The dataset was created within the COCOHA Project: Cognitive Control of a Hearing Aid
Facebook
Twitterhttps://github.com/bdsp-core/bdsp-license-and-duahttps://github.com/bdsp-core/bdsp-license-and-dua
The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University: Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH). The EEG data includes three types:
rEEG: "routine EEGs" recorded in the outpatient setting.
EMU: recordings obtained in the inpatient setting, within the Epilepsy Monitoring Unit (EMU).
ICU/LTM: recordings obtained from acutely and critically ill patients within the intensive care unit (ICU).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 848,640 records with 17 columns, representing EEG (Electroencephalogram) signals recorded from multiple electrode positions on the scalp, along with a status label. The dataset is be related to the study of Alzheimer’s Disease (AD).
Features (16 continuous variables, float64): Each feature corresponds to the electrical activity recorded from standard EEG electrode placements based on the international 10-20 system:
Fp1, Fp2, F7, F3, Fz, F4, F8
T3, C3, Cz, C4, T4
T5, P3, Pz, P4
These channels measure brain activity in different cortical regions (frontal, temporal, central, and parietal lobes).
Target variable (1 categorical variable, int64):
status: Represents the condition or classification of the subject at the time of recording (e.g., patient vs. control, or stage of Alzheimer’s disease).
Size & Integrity:
Rows: 848,640 samples
Columns: 17 (16 EEG features + 1 status label)
Data types: 16 float features, 1 integer label
Missing values: None (clean dataset)
This dataset is suitable for machine learning and deep learning applications such as:
EEG signal classification (AD vs. healthy subjects)
Brain activity pattern recognition
Feature extraction and dimensionality reduction (e.g., PCA, wavelet transforms)
Time-series analysis of EEG recordings
Facebook
Twitterhttps://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
This data set consists of over 240 two-minute EEG recordings obtained from 20 volunteers. Resting-state and auditory stimuli experiments are included in the data. The goal is to develop an EEG-based Biometric system.
The data includes resting-state EEG signals in both cases: eyes open and eyes closed. The auditory stimuli part consists of six experiments; Three with in-ear auditory stimuli and another three with bone-conducting auditory stimuli. The three stimuli for each case are a native song, a non-native song, and neutral music.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The neural basis of object recognition and semantic knowledge have been the focus of a large body of research but given the high dimensionality of object space, it is challenging to develop an overarching theory on how brain organises object knowledge. To help understand how the brain allows us to recognise, categorise, and represent objects and object categories, there is a growing interest in using large-scale image databases for neuroimaging experiments. Traditional image databases are based on manually selected object concepts and often single images per concept. In contrast, ‘big data’ stimulus sets typically consist of images that can vary significantly in quality and may be biased in content. To address this issue, recent work developed THINGS: a large stimulus set of 1,854 object concepts and 26,107 associated images (https://things-initiative.org/). In the current paper, we present THINGS-EEG, a dataset containing human electroencephalography responses from 50 subjects to all concepts and 22,248 images in the THINGS stimulus set. The THINGS-EEG dataset provides neuroimaging recordings to a systematic collection of objects and concepts and can therefore support a wide array of research to understand visual object processing in the human brain.
This repository contains the code that was used to perform the analyses described in this paper:
Grootswagers, T., Zhou, I., Robinson, A.K. et al. Human EEG recordings for 1,854 concepts presented in rapid serial visual presentation streams. Sci Data 9, 3 (2022). https://doi.org/10.1038/s41597-021-01102-7
THINGS images and concept descriptions obtained from: https://osf.io/jum2f (see also: https://things-initiative.org/)
The raw data, preprocessed data, and grand-average RDMs are publicly available on Openneuro: https://openneuro.org/datasets/ds003825
RDMs for single subjects are publicly available on figshare: https://doi.org/10.6084/m9.figshare.14721282 (note: OSF sometimes incorrectly lists this as private)
see the README in the code folder for instructions on how to reproduce the figures in the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database includes the de-identified EEG data from 62 healthy individuals who participated in a brain-computer interface (BCI) study. All subjects underwent 7-11 sessions of BCI training which involves controlling a computer cursor to move in one-dimensional and two-dimensional spaces using subject’s “intent”. EEG data were recorded with 62 electrodes. In addition to the EEG data, behavioral data including the online success rate of BCI cursor control are also included.This dataset was collected under support from the National Institutes of Health via grants AT009263, EB021027, NS096761, MH114233, RF1MH to Dr. Bin He. Correspondence about the dataset: Dr. Bin He, Carnegie Mellon University, Department of Biomedical Engineering, Pittsburgh, PA 15213. E-mail: bhe1@andrew.cmu.edu This dataset has been used and analyzed to study the learning of BCI control and the effects of mind-body awareness training on this process. The results are reported in: Stieger et al, “Mindfulness Improves Brain Computer Interface Performance by Increasing Control over Neural Activity in the Alpha Band,” Cerebral Cortex, 2020 (https://doi.org/10.1093/cercor/bhaa234). Please cite this paper if you use any data included in this dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains the EEG resting state-closed eyes recordings from 88 subjects in total. Participants: 36 of them were diagnosed with Alzheimer's disease (AD group), 23 were diagnosed with Frontotemporal Dementia (FTD group) and 29 were healthy subjects (CN group). Cognitive and neuropsychological state was evaluated by the international Mini-Mental State Examination (MMSE). MMSE score ranges from 0 to 30, with lower MMSE indicating more severe cognitive decline. The duration of the disease was measured in months and the median value was 25 with IQR range (Q1-Q3) being 24 - 28.5 months. Concerning the AD groups, no dementia-related comorbidities have been reported. The average MMSE for the AD group was 17.75 (sd=4.5), for the FTD group was 22.17 (sd=8.22) and for the CN group was 30. The mean age of the AD group was 66.4 (sd=7.9), for the FTD group was 63.6 (sd=8.2), and for the CN group was 67.9 (sd=5.4).
Recordings: Recordings were aquired from the 2nd Department of Neurology of AHEPA General Hispital of Thessaloniki by an experienced team of neurologists. For recording, a Nihon Kohden EEG 2100 clinical device was used, with 19 scalp electrodes (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2) according to the 10-20 international system and 2 reference electrodes (A1 and A2) placed on the mastoids for impendance check, according to the manual of the device. Each recording was performed according to the clinical protocol with participants being in a sitting position having their eyes closed. Before the initialization of each recording, the skin impedance value was ensured to be below 5k?. The sampling rate was 500 Hz with 10uV/mm resolution. The recording montages were anterior-posterior bipolar and referential montage using Cz as the common reference. The referential montage was included in this dataset. The recordings were received under the range of the following parameters of the amplifier: Sensitivity: 10uV/mm, time constant: 0.3s, and high frequency filter at 70 Hz. Each recording lasted approximately 13.5 minutes for AD group (min=5.1, max=21.3), 12 minutes for FTD group (min=7.9, max=16.9) and 13.8 for CN group (min=12.5, max=16.5). In total, 485.5 minutes of AD, 276.5 minutes of FTD and 402 minutes of CN recordings were collected and are included in the dataset.
Preprocessing: The EEG recordings were exported in .eeg format and are transformed to BIDS accepted .set format for the inclusion in the dataset. Automatic annotations of the Nihon Kohden EEG device marking artifacts (muscle activity, blinking, swallowing) have not been included for language compatibility purposes (If this is an issue, please use the preprocessed dataset in Folder: derivatives). The unprocessed EEG recordings are included in folders named: sub-0XX. Folders named sub-0XX in the subfolder derivatives contain the preprocessed and denoised EEG recordings. The preprocessing pipeline of the EEG signals is as follows. First, a Butterworth band-pass filter 0.5-45 Hz was applied and the signals were re-referenced to A1-A2. Then, the Artifact Subspace Reconstruction routine (ASR) which is an EEG artifact correction method included in the EEGLab Matlab software was applied to the signals, removing bad data periods which exceeded the max acceptable 0.5 second window standard deviation of 17, which is considered a conservative window. Next, the Independent Component Analysis (ICA) method (RunICA algorithm) was performed, transforming the 19 EEG signals to 19 ICA components. ICA components that were classified as “eye artifacts” or “jaw artifacts” by the automatic classification routine “ICLabel” in the EEGLAB platform were automatically rejected. It should be noted that, even though the recording was performed in a resting state, eyes-closed condition, eye artifacts of eye movement were still found at some EEG recordings.
A complete analysis of this dataset can be found in the published Data Descriptor paper "A Dataset of Scalp EEG Recordings of Alzheimer’s Disease, Frontotemporal Dementia and Healthy Subjects from Routine EEG", https://doi.org/10.3390/data8060095 *****Im not the original creator of this dataset it was published on https://openneuro.org/datasets/ds004504/versions/1.0.6 i just ported it here for ease of use *****
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).The active phase of the experimental session included sequential execution of 60 fine motor tasks - squeezing a hand into a fist after the first audio command and holding it until the second audio command (30 repetitions per hand) (see Fig.1). Duration of the audio command determined type of the motor action to be executed: 0.25s for left hand (LH) movement and 0.75s for right rand (RH) movement. The time interval between two audio signals was selected randomly in the range 4-5s for each trial. The sequence of motor tasks was randomized and the pause between tasks was also chosen randomly in the range 6-8s to exclude possible training or motor-preparation effects caused by the sequential execution of the same tasks.Acquired EEG signals were then processed via preprocessing tools implemented in MNE Python package. Specifically, raw EEG signals were filtered by a Butterworth 5th order filter in the range 1-100 Hz, and by 50Hz Notch filter. Further, Independent Component Analysis (ICA) was applied to remove ocular and cardiac artifacts. Artifact-free EEG recordings were then segmented into 60 epochs according to the experimental protocol. Each epoch was 14s long, including 3s of baseline and 11s of motor-related brain activity, and time-locked to the first audio command indicating the start of motor execution. After visual inspection epochs that still contained artifacts were rejected. Finally, 15 epochs per movement type were stored for each subject.Individual epochs for each subject are stored in the attached MNE .fif files. Prefix EA or YA in the name of the file identifies the age group, which subject belongs to. Postfix LH or RH in the name of the file indicates the type of motor tasks.EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
**Overview:
The Bonn EEG Dataset is a widely recognized dataset in the field of biomedical signal processing and machine learning, specifically designed for research in epilepsy detection and EEG signal analysis. It contains electroencephalogram (EEG) recordings from both healthy individuals and patients with epilepsy, making it suitable for tasks such as seizure detection and classification of brain activity states. The dataset is structured into five distinct subsets (labeled A, B, C, D, and E), each comprising 100 single-channel EEG segments, resulting in a total of 500 segments. Each segment represents 23.6 seconds of EEG data, sampled at a frequency of 173.61 Hz, yielding 4,096 data points per segment, stored in ASCII format as text files.
****Structure and Label:
**Key Characteristics
**Applications
The Bonn EEG Dataset is ideal for machine learning and signal processing tasks, including: - Developing algorithms for epileptic seizure detection and prediction. - Exploring feature extraction techniques, such as wavelet transforms, for EEG signal analysis. - Classifying brain states (healthy vs. epileptic, interictal vs. ictal). - Supporting research in neuroscience and medical diagnostics, particularly for epilepsy monitoring and treatment.
**Source
**Citation
When using this dataset, researchers are required to cite the original publication: Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6), 061907. DOI: 10.1103/PhysRevE.64.061907.
**Additional Notes
The dataset is randomized, with no specific information provided about patients or electrode placements, ensuring simplicity and focus on signal characteristics.
The data is not hosted on Kaggle or Hugging Face but is accessible directly from the University of Bonn’s repository or mirrored sources.
Researchers may need to apply preprocessing steps, such as filtering or normalization, to optimize the data for machine learning tasks.
The dataset’s balanced structure and clear labels make it an excellent choice for a one-week machine learning project, particularly for tasks involving traditional algorithms like SVM, Random Forest, or Logistic Regression.
This dataset provides a robust foundation for learning signal processing, feature extraction, and machine learning techniques while addressing a real-world medical challenge in epilepsy detection.