Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EEG signals of various subjects in text files are uploaded. It can be useful for various EEG signal processing algorithms- filtering, linear prediction, abnormality detection, PCA, ICA etc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain the raw data and processing parameters to go with the paper "Hierarchical structure guides rapid linguistic predictions during naturalistic listening" by Jonathan R. Brennan and John T. Hale. These files include the stimulus (wav files), raw data (matlab format for the Fieldtrip toolbox), data processing paramters (matlab), and variables used to align the stimuli with the EEG data and for the statistical analyses reported in the paper.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains EEG recordings from 18 subjects listening to one of two competing speech audio streams. Continuous speech in trials of ~50 sec. was presented to normal hearing listeners in simulated rooms with different degrees of reverberation. Subjects were asked to attend one of two spatially separated speakers (one male, one female) and ignore the other. Repeated trials with presentation of a single talker were also recorded. The data were recorded in a double-walled soundproof booth at the Technical University of Denmark (DTU) using a 64-channel Biosemi system and digitized at a sampling rate of 512 Hz. Full details can be found in:
and
The data is organized in format of the publicly available COCOHA Matlab Toolbox. The preproc_script.m demonstrates how to import and align the EEG and audio data. The script also demonstrates some EEG preprocessing steps as used the Wong et al. paper above. The AUDIO.zip contains wav-files with the speech audio used in the experiment. The EEG.zip contains MAT-files with the EEG/EOG data for each subject. The EEG/EOG data are found in data.eeg with the following channels:
The expinfo table contains information about experimental conditions, including what what speaker the listener was attending to in different trials. The expinfo table contains the following information:
DATA_preproc.zip contains the preprocessed EEG and audio data as output from preproc_script.m.
The dataset was created within the COCOHA Project: Cognitive Control of a Hearing Aid
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Participants were 61 children with ADHD and 60 healthy controls (boys and girls, ages 7-12). The ADHD children were diagnosed by an experienced psychiatrist to DSM-IV criteria, and have taken Ritalin for up to 6 months. None of the children in the control group had a history of psychiatric disorders, epilepsy, or any report of high-risk behaviors. EEG recording was performed based on 10-20 standard by 19 channels (Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2) at 128 Hz sampling frequency. The A1 and A2 electrodes were the references located on earlobes. Since one of the deficits in ADHD children is visual attention, the EEG recording protocol was based on visual attention tasks. In the task, a set of pictures of cartoon characters was shown to the children and they were asked to count the characters. The number of characters in each image was randomly selected between 5 and 16, and the size of the pictures was large enough to be easily visible and countable by children. To have a continuous stimulus during the signal recording, each image was displayed immediately and uninterrupted after the child’s response. Thus, the duration of EEG recording throughout this cognitive visual task was dependent on the child’s performance (i.e. response speed). If there are any questions, please contact nasrabadi@shahed.ac.ir
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article presents an EEG dataset collected using the EMOTIV EEG 5-Channel Sensor kit during four different types of stimulation: Complex mathematical problem solving, Trier mental challenge test, Stroop colour word test, and Horror video stimulation, Listening to relaxing music. The dataset consists of EEG recordings from 22 subjects for Complex mathematical problem solving, 24 for Trier mental challenge test, 24 for Stroop colour word test, 22 for horror video stimulation, and 20 for relaxed state recordings. The data was collected in order to investigate the neural correlates of stress and to develop models for stress detection based on EEG data. The dataset presented in this article can be used for various applications, including stress management, healthcare, and workplace safety. The dataset provides a valuable resource for researchers and developers working on stress detection using EEG data, while the stress detection method provides a useful tool for evaluating the effectiveness of different stress detection models. Overall, this article contributes to the growing body of research on stress detection and management using EEG data and provides a useful resource for researchers and practitioners working in this field.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset I: This is the original EEG data of twelve healthy subjects for driver fatigue detection. Due to personal privacy, the digital number represents different participants. The .cnt files were created by a 40-channel Neuroscan amplifier, including the EEG data in two states in the process of driving.Dataset II: This project adopted an event-related lane-departure paradigm in a virtual-reality (VR) dynamic driving simulator to quantitatively measure brain EEG dynamics along with the fluctuation of task performance throughout the experiment.All subjects were required to have driving license. None of the participants had a history of psychological disorders. All participants were instructed to sustain their attention to perform the task during the experiment, and the 32-ch EEG signals and the vehicle position were recorded simultaneously.Prior to the experiment, all participants completed a consent form stating their clear understanding of the experimental protocol which had been approved by Institutional Review Broad of Taipei Veterans General Hospital, Taiwan.Experiment:All subjects participated in the sustained-attention driving experiment for 1.5 hours in the afternoon (13:00-14:00) after lunch, and all of them were asked to keep their attention focused on driving during the entire period. There was no break or resting session. At the beginning of the experiment (without any recordings), a five-minute pre-test was performed to ensure that every subject understood the instructions and they did not suffer from simulator-induced nausea. To investigate the effect of kinesthesia on brain activity in the sustained-attention driving task, each subject was asked to participate at least two driving sessions on different days. Each session lasted for about 90 min. One was the driving session with a fixed-based simulator but with no kinesthetic feedback, so subject had to monitor the vehicle deviation visually from the virtual scene.The other driving session involved a motion-based simulator with a six degree-of-freedom Stewart platform to simulate the dynamic response of the vehicle to the deviation event or steering. The visual and kinesthetic inputs together aroused the subject to attend to the deviation event and take action to correct the driving trajectory Data Requirement.A wired EEG cap with 32 Ag/AgCl electrodes, including 30 EEG electrodes and two reference electrodes (opposite lateral mastoids) was used to record the electrical activity of the brain from the scalp during the driving task. The EEG electrodes were placed according to a modified international 10-20 system. The contact impedance between all electrodes and the skin was kept
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set consists of electroencephalography (EEG) data from 50 (Subject1 – Subject50) participants with acute ischemic stroke aged between 30 and 77 years. The participants included 39 male and 11 female. The time after stroke ranged from 1 days to 30 days. 22 participants had right hemisphere hemiplegia and 28 participants had left hemisphere hemiplegia. All participants were originally right-handed. Each of the participants sat in front of a computer screen with an arm resting on a pillow on their lap or on a table and they carried out the instructions given on the computer screen. At the trial start, a picture with text description which was circulated with left right hand, were presented for 2s. We asked the participants to focus their mind on the hand motor imagery which was instructed, at the same time, the video of ipsilateral hand movement is displayed on the computer screen and lasts for 4s. Next, take a 2s break.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recent advances in computational power availibility and cloud computing has prompted extensive research in epileptic seizure detection and prediction. EEG (electroencephalogram) datasets from ‘Dept. of Epileptology, Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format. Machine learning and Deep learning models are easily implementable with aid of '.csv' format.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset comprised 14 patients with paranoid schizophrenia and 14 healthy controls. Data were acquired with the sampling frequency of 250 Hz using the standard 10-20 EEG montage with 19 EEG channels: Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2. The reference electrode was placed between electrodes Fz and Cz.
The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University:Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH).
SAM 40: Dataset of 40 subject EEG recordings to monitor the induced-stress while performing Stroop color-word test, arithmetic task, and mirror image recognition task
presents a collection of electroencephalogram (EEG) data recorded from 40 subjects (female: 14, male: 26, mean age: 21.5 years). The dataset was recorded from the subjects while performing various tasks such as Stroop color-word test, solving arithmetic questions, identification of symmetric mirror images, and a state of relaxation. The experiment was primarily conducted to monitor the short-term stress elicited in an individual while performing the aforementioned cognitive tasks. The individual tasks were carried out for 25 s and were repeated to record three trials. The EEG was recorded using a 32-channel Emotiv Epoc Flex gel kit. The EEG data were then segmented into non-overlapping epochs of 25 s depending on the various tasks performed by the subjects. The EEG data were further processed to remove the baseline drifts by subtracting the average trend obtained using the Savitzky-Golay filter. Furthermore, the artifacts were also removed from the EEG data by applying wavelet thresholding. The dataset proposed in this paper can aid and support the research activities in the field of brain-computer interface and can also be used in the identification of patterns in the EEG data elicited due to stress.
This is the Dataset Collected by Shahed Univeristy Released in IEEE.
the Columns are: Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2, Class, ID
the first 19 are channel names.
Class: ADHD/Control
ID: Patient ID
Participants were 61 children with ADHD and 60 healthy controls (boys and girls, ages 7-12). The ADHD children were diagnosed by an experienced psychiatrist to DSM-IV criteria, and have taken Ritalin for up to 6 months. None of the children in the control group had a history of psychiatric disorders, epilepsy, or any report of high-risk behaviors.
EEG recording was performed based on 10-20 standard by 19 channels (Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2) at 128 Hz sampling frequency. The A1 and A2 electrodes were the references located on earlobes.
Since one of the deficits in ADHD children is visual attention, the EEG recording protocol was based on visual attention tasks. In the task, a set of pictures of cartoon characters was shown to the children and they were asked to count the characters. The number of characters in each image was randomly selected between 5 and 16, and the size of the pictures was large enough to be easily visible and countable by children. To have a continuous stimulus during the signal recording, each image was displayed immediately and uninterrupted after the child’s response. Thus, the duration of EEG recording throughout this cognitive visual task was dependent on the child’s performance (i.e. response speed).
Citation Author(s): Ali Motie Nasrabadi Armin Allahverdy Mehdi Samavati Mohammad Reza Mohammadi
DOI: 10.21227/rzfh-zn36
License: Creative Commons Attribution
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The EegDot data set (EEG data evoked by Different Odor Types established by Tianjin University) collected using a Cerebus neural signal acquisition equipment involved thirteen odor stimulating materials, five of which (smelling like rose (A), caramel (B), rotten (C), canned peach (D), and excrement (E)) were selected from the T&T olfactometer (from the Daiichi Yakuhin Sangyo Co., Ltd., Japan) and the remaining eight from essential oils (i.e., mint (F), tea tree (G), coffee (H), rosemary (I), jasmine (J), lemon (K), vanilla (L) and lavender (M)).The EegDoc data set (EEG data evoked by Different Odor Concentrations established by Tianjin University) collected using a Cerebus neural signal acquisition equipment involved 2 types of odors (smelling like roses and rotten odors), each with 5 concentrations. Five concentrations of the rose odor are expressed as A10-3.0 (A30), A10-3.5 (A35), A10-4.0 (A40), A10-4.5 (A45) and A10-5.0 (A50), and five concentrations of the rotten odor are expressed as C10-4.0 (C40), C10-4.5 (C45), C10-5.0 (C50), C10-5.5 (C55) and C10-6.0 (C60).
EEEyeNet is a dataset and benchmark with the goal of advancing research in the intersection of brain activities and eye movements. It consists of simultaneous Electroencephalography (EEG) and Eye-tracking (ET) recordings from 356 different subjects collected from three different experimental paradigms.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset
Synthetic EEG data generated by the ‘bai’ model based on real data.
Features/Columns:
No: "Number" Sex: "Gender" Age: "Age of participants" EEG Date: "The date of the EEG" Education: "Education level" IQ: "IQ level of participants" Main Disorder: "General class definition of the disorder" Specific Disorder: "Specific class definition of the disorder"
Total Features/Columns: 1140
Content:
Obsessive Compulsive Disorder Bipolar Disorder… See the full description on the dataset page: https://huggingface.co/datasets/Neurazum/General-Disorders-EEG-Dataset-v1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data presents a collection of EEG recordings of seven participants with Intellectual and Developmental Disorder (IDD) and seven Typically Developing Controls (TDC). The data is recorded while the participants observe a resting state and a soothing music stimuli. The data was collected using a high-resolution multi-channel dry-electrode system from EMOTIV called EPOC+. This is a 14-channel device with two reference channels and a sampling frequency of 128 Hz. The data was collected in a noise-isolated room. The participants were informed of the experimental procedure, related risks and were asked to keep their eyes closed throughout the experiment. The data is provided in two formats, (1) Raw EEG data and (2) Pre-processed and clean EEG data for both the group of participants. This data can be used to explore the functional brain connectivity of the IDD group. In addition, behavioral information like IQ, SQ, music apprehension and facial expressions (emotion) for IDD participants is provided in file “QualitativeData.xlsx".
Data Usage: The data is arranged as follows: 1. Raw Data: Data/RawData/RawData_TDC/Music and Rest Data/RawData/RawData_IDD/Music and Rest 2. Clean Data Data/CleanData/CleanData_TDC/Music and Rest Data/CleanData/CleanData_IDD/Music and Rest
The dataset comes along with a fully automated EEG pre-processing pipeline. This pipeline can be used to do batch-processing of raw EEG files to obtain clean and pre-processed EEG files. Key features of this pipeline are : (1) Bandpass filtering (2) Linenoise removal (3) Channel selection (4) Independent Component Analysis (ICA) (5) Automatic artifact rejection All the required files are present in the Pipeline folder.
If you use this dataset and/or the fully automated pre-processing pipeline for your research work, kindly cite these two articles linked to this dataset.
(1) Sareen, E., Singh, L., Varkey, B., Achary, K., Gupta, A. (2020). EEG dataset of individuals with intellectual and developmental disorder and healthy controls under rest and music stimuli. Data in Brief, 105488, ISSN 2352-3409, DOI:https://doi.org/10.1016/j.dib.2020.105488. (2) Sareen, E., Gupta, A., Verma, R., Achary, G. K., Varkey, B (2019). Studying functional brain networks from dry electrode EEG set during music and resting states in neurodevelopment disorder, bioRxiv 759738 [Preprint]. Available from: https://www.biorxiv.org/content/10.1101/759738v1
Overview
This website allows downloading Matlab functions generating simulated EEG data according to two theories of Event Related Potentials (ERP): the classical theory, and the phase-resetting theory. According to the classical view, peaks in ERP waveforms reflect phasic bursts of activity in one or more brain regions that are triggered by experimental events of interest. Specifically, it is assumed that an ERP-like waveform is evoked by each event, but that on any given trial this ERP "signal" is buried in ongoing EEG "noise". According the the phase-resetting theory, the experimental events reset the phase of ongoing oscillations. In particular we have implemented the method of data generation by phase-resetting proposed by Makinen et al. (2005; Neuroimage, 24:961-968).
The functions available on this website generate data in a format of the EEGLAB - a popular tool for analysis of the EEG data. This website also provides a tutorial of how to use the functions to generate the data.
The functions were used to generate data analysed in the following papers:
Nick Yeung, Rafal Bogacz, Clay B. Holroyd and Jonathan D. Cohen. (2004) Detection of synchronized oscillations in the electroencephalogram: An evaluation of methods.Psychophysiology, 41: 822-832.
Nick Yeung, Rafal Bogacz, Clay B. Holroyd, Sander Nieuwenhuis and Jonathan D. Cohen. (2007) Theta phase-resetting and the error-related negativity, Psychophysiology, 44: 39-49.
Installation
After downloading and decompressing the ZIP archive, the tutorial on how to use files is given below. The simulated data may be then analysed using EEGLAB.
After installing the EEGLAB please make sure to follow the instruction on how to make EEGLAB visible for Matlab (how to add path), which can be found on the EEGLAB "Download and Install" website or in the file "1ST_README.txt" in the EEGLAB.
Generating data according to the classical theory
Generating a single trial of EEG
As stated in the overview, the simulated data is generated by adding signal and noise components. These two components can be generated respectively by two functions: peak and noise. Let us first discuss generation of noise and then of the signal. Noise is generated such that its power spectrum matches the power spectrum of human EEG. In order to obtain details of parameters of function noise, one can type in Matlab (as usual):
help noise
In essence this function has 3 parameters: 1st describing the length of a single trial of the signal by the number of samples, 2nd describing the number of trials, and 3rd describing sampling frequency. Hence to generate one trial of 0.8s of noise with sampling frequency 250Hz, one can type in Matlab:
mynoise = noise (200, 1, 250);
The value of the first parameters describing the number of samples was computed by multiplying the duration of the noise by the sampling frequency, i.e. 0.8 * 250 = 200. The function generates a vector containing the samples. It can be now visualised by typing:
plot (mynoise);
The resulting image may look like:
Function peak has very similar format, but it has additional parameters, including: 4th parameter describing frequency of the peak, and 5th describing position of the centre of the peak. For example, to generate a peak with frequency 5Hz and center in 115th sample, and display it, one can type:
mypeak = peak (200, 1, 250, 5, 115); plot (mypeak);
The resulting image may look like:
Now, once we generated both signal and noise, we can combine them. If we want to make the peak negative, we can multiply it by -1 before addition, and we can also scale the amplitudes of noise and signal by multiplying the vectors representing them before addition. For example, if we type:
mysignal = -5 * mypeak + 3 * mynoise; plot (mysignal);
the resulting image will be:
Comparing the above figure with the figure showing pure noise, one can observe that they differ around 110-120 sample due to superposition of the negative peak.
Generating complete EEG data
Function simulatedEEG generates the complete set of data (973 trials and 31 electrodes) we used in the paper "Detection of synchronized oscillations in the electroencephalogram: An evaluation of methods". See the code of this function for details, below we give the overview of main operations required to generate the complete data.
To generate multiple trials of signal, the number of trials need to be specified in the second parameter of functions peak and noise. The resulting data struture will be a vector with concatenated signals. When generating multiple trials, 6th parameter may be specified in function peak describing the temporal jitter of the peak across the trials. In order to generate data from multiple electrodes, one should generate data for each electrode separatelly and construct a matrix with a number of rows equal to the number of electrodes, in which each row correspond to the signal from one electrode. Also, one needs to remember, that the peaks have different amplitudes in different electrodes, hence they should be scaled by the co-efficients from a dipole model.
To generate sample complete set of data, type:
mydata = simulatedEEG;
Analysing simulated data
Once the data have been created (e.g. using the command above), they can be loaded to the EEGLAB. To run the EEGLAB, simply type in Matlab eeglab. To load the data, from menu "File" choose "Import data" and then "From ASCII/float file or Matlab array". In the window which opens you need to fill the following fields:
In "Data file/array" type the name of the Matlab variable with the data (e.g. "mydata", if you used mydata = simulatedEEG;).
In "Time points per epoch" type the number of samples per trial (e.g. 200, if you used simulatedEEG).
In "Data sampling rate" type the sampling rate (e.g. 250, if you used simulatedEEG).
Next to "Channel location file" click on "Browse" and find a file containing locations of electrodes (e.g. if you used simulatedEEG, the corresponding locations of electrodes are stored in file "nickloc31.locs").
and then click OK twice. Now you are ready to do analyses of the data available from menu "Plot", for example try "Channel spectra and maps".
One of the functions which can be downloaded from this website, figures, generates sample figures from the paper "Detection of synchronized oscillations in the electroencephalogram: An evaluation of methods". However, to execute this function, one first needs to add to the path the subdirectory "functions" in the "EEGLAB". Thus for example, if your EEGLAB is installed in the directory:/home/staff/rafal/linux/research/eeg/eeglab4.515, then before executing function figures, you need to type in Matlab:
addpath('/home/staff/rafal/linux/research/eeg/eeglab4.515/functions');
Generating data according to the phase-resetting theory
As stated in the Introduction, the phase-resetting theory assumes that the experimental events reset the phase of ongoing oscillations. Function phasereset allows to generate a sinusoid whose phase is being reset. The first three parameter of this function, are the same as for peak and noise. The next two parameters describe the minumum and maximum frequency of the oscillation - on each trial the oscillation is generated by choosing a random number from this range. The fifth parameter describes the frame in which the reset should occur. The initial phase of the oscillation is chosen randomly. Thus for example, to generate and plot the sinusoid of frequency 5 Hz being reset at 115th sample, we can type:
mysin = phasereset (200, 1, 250, 5, 5, 115); plot (mysin);
The resulting image may look like:
Makinen et al. generated their simulated data by summing 4 such sinusoids with freqencies chosen randomly from range 4-16Hz. Such data is generated by function Makinen, which has the same parameters as phasereset except the parameters describing the frequency range. Hence typing:
mysin = Makinen (200, 1, 250, 115); plot (mysin);
may result in an image like:
Function Makinen1a generates 30 trials of the above type, displays them, the resulting ERP and the variance in the EEG amplitude, and thus replicate Figure 1a of the paper by Makienen et al. (2005).
Have fun!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A set of electroencephalogram (EEG) signals data obtained from NeuroSky. There are 30 participants (female = 15, male = 15) join the data collection.The EEG data were recorded through 6 protocols and 11 tasks. The six protocols are baseline(2 tasks), emotional state(4 tasks), memorize task, executive task, recall task, and baseline extension(2 tasks). We collected 12 minutes for each participant and separated the data into different tasks. The obtained data is used for analyzing the pattern of internet addiction. Further analysis of pre-processing, feature extraction and classification will be applied to the data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
"ChineseEEG" (Chinese Linguistic Corpora EEG Dataset) contains high-density EEG data and simultaneous eye-tracking data recorded from 10 participants, each silently reading Chinese text for about 11 hours. This dataset further comprises pre-processed EEG sensor-level data generated under different parameter settings, offering researchers a diverse range of selections. Additionally, we provide embeddings of the Chinese text materials encoded from BERT-base-chinese model, which is a pre-trained NLP specifically used for Chinese, aiding researchers in exploring the alignment between text embeddings from NLP models and brain information representations.
In total, data from 10 participants were used (18-24 years old, averaged 20.68 years old, and 5 males). No participants reported neurological or psychiatric history. All participants are right-handed and have normal or corrected-to-normal vision.
The experimental materials consist of two novels in Chinese, both in the genre of children's literature. The first is The Little Prince and the second is Garnett Dream. For The Little Prince, the preface was used as material for the practice reading phase. The main body of the novel was then used for seven sessions in the formal reading phase. The first six sessions each included 4 chapters of the novel, while the seventh session included the last two chapters. For Garnett Dream, the first 18 chapters were used for 18 sessions in the formal reading stage, with each session including a complete chapter.
To properly present the text on the screen during the experiments, the content of each session was segmented into a series of units, with each unit containing no more than 10 Chinese characters. These segmented contents were saved in Excel (.xlsx) format for subsequent usage. During the experiment, three adjacent units from each session's content will be displayed on the screen in three separate lines, with the middle line highlighted for the participant to read. In summary, a total of 115,233 characters (24,324 in The Little Prince and 90,909 in Garnett Dream), of which 2985 characters were unique, were used as experimental stimuli in ChineseEEG dataset.
The original and segmented novels are saved in the derivatives/novels
folder. The segmented_novel
folder in novels
folder contains two types of Excel files: one type of file has names ending with "display," while the other type does not contain this suffix. The former stores units that have been segmented; the latter includes units that have been reassembled according to the experimental presentation format. These files ending with "display" will be used to support the execution of relevant code, in order to achieve effective stimulus presentation in the experiment.
The code for generating these two types of files, as well as the code for experimental presentation, can be found in the GitHub repository: https://github.com/ncclabsustech/Chinese_reading_task_eeg_processing.
Participants were tasked with reading a novel and were required to keep their heads still and keep their gaze on the highlighted (red) Chinese characters moving across the screen, reading at a pace set by the program. They were required to read an entire novel in multiple runs within a single session. Each run is divided into two phases: the eye-tracker calibration phase and the reading phase.
The eye-tracker calibration phase is at the beginning of each run, requiring participants to keep their gaze at a fixation point, which sequentially appeared at the four corners and the center of the screen.
In the reading phase, the screen initially displayed the serial number of the current chapter. Subsequently, the text appeared with three lines per page, ensuring each line contained no more than ten Chinese characters (excluding punctuation). On each page, the middle line was highlighted as the focal point, while the upper and lower lines were displayed with reduced intensity as the background. Each character in the middle line was sequentially highlighted with red color for 0.35 s, and participants were required to read the novel content following the highlighted cues.
For detailed information about the experiment settings and procedures, please refer to our paper at https://doi.org/10.1101/2024.02.08.579481.
To precisely co-register EEG segments with individual characters during the experiment, we marked the EEG data with triggers.
The raw EEG data has a sampling rate of 1 kHz, while the filtered data and pre-processed data has a sampling rate of 256 Hz.
The dataset is organized following the EEG-BIDS specification using the MNE-BIDS package. The dataset contains some regular BIDS files, 10 participants’ data folders, and a derivatives folder. The stand-alone files offer an overview of the dataset: i) dataset_description.json is a JSON file depicting the information of the dataset, such as the name, dataset type and authors; ii) participants.tsv contains participants’ information, such as age, sex, and handedness; iii) participants.json describes the column attributes in participants.tsv; iv) README.md contains a detailed introduction of the dataset. Each participant’s folder contains two folders named ses-LittlePrince and ses-GarnettDream, which store the data of this participant reading two novels, respectively. Each of the two folders contains a folder eeg and one file sub-xx_scans.tsv. The tsv file contains information about the scanning time of each file. The eeg folder contains the source raw EEG data of several runs, channels, and marker events files. Each run includes an eeg.json file, which encompasses detailed information for that run, such as the sampling rate and the number of channels. Events are stored in events.tsv with onset and event ID. The EEG data is converted from raw metafile format (.mff file) to BrainVision format (.vhdr, .vmrk and .eeg files) since EEG-BIDS is not officially compatible with .mff format. The derivatives folder contains six folders: eyetracking_data, filtered_0.5_80, filtered_0.5_30, preproc, novels, and text_embeddings. The eyetracking_data folder contains all the eye-tracking data. Each eye-tracking data is formatted in a .zip file with eye moving trajectories and other parameters like sampling rate saved in different files. The filtered_0.5_80 folder and filtered_0.5_30 folder contain data that has been processed up to the pre-processing step of 0.5-80 Hz and 0.5-30 Hz band-pass filtering respectively. This data is suitable for researchers who have specific requirements and want to perform customized processing on subsequent pre-processing steps like ICA and re-referencing. The preproc folder contains minimally pre-processed EEG data that is processed using the whole pre-processing pipeline. It includes four additional types of files compared to the participants’ raw data folders in the root directory: i) bad_channels.json contains bad channels marked during bad channel rejection phase. ii) ica_components.npy stores the values of all independent components in the ICA phase. iii) ica_components.json includes the independent components excluded in ICA (the ICA random seed is fixed, allowing for reproducible results). iv) ica_components_topography.png is a picture of the topographic maps of all independent components, where the excluded components are labeled in grey. The novels folder contains the original and segmented text stimuli materials. The original novels are saved in .txt format and the segmented novels corresponding to each experimental run are saved in Excel (.xlsx) files. The text_embeddings folder contains embeddings of the two novels. The embeddings corresponding to each experimental run are stored in NumPy (.npy) files
For an overview of the structure, please refer to our paper at https://doi.org/10.1101/2024.02.08.579481.
For the pre-processed data in the derivatives folder, we only did minimal pre-processing to retain most useful information. The pre-processing steps include data segmentation, downsampling, filtering, bad channel interpolation, ICA, averaging.
During the data segmentation phase, we only retained data from the formal reading phase of the experiment. Based on the event markers during the data collection phase, we segmented the data, removing sections irrelevant to the formal experiment such as calibration and preface reading. To minimize the impact of subsequent filtering steps on the beginning and end of the signal, an additional 10 seconds of data was retained before the start of the formal reading phase. Subsequently, the signal was downsampled to 256 Hz. Following this, a 50 Hz notch filter was applied to remove the powerline noise from the signal. Next, we performed band-pass overlap-add FIR filter on the signal to eliminate the low-frequency direct current components and high-frequency noise. Here, two versions of filtered data were offered. The first one has a filter band of 0.5-80 Hz and the second one has a filter band of 0.5-30 Hz. Researchers can choose the appropriate version based on their specific needs. After filtering, we performed an interpolation of bad channels. Independent Component Analysis (ICA) was then applied to the data, utilizing the infomax algorithm. The number of independent components was set to 20,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).The active phase of the experimental session included sequential execution of 60 fine motor tasks - squeezing a hand into a fist after the first audio command and holding it until the second audio command (30 repetitions per hand) (see Fig.1). Duration of the audio command determined type of the motor action to be executed: 0.25s for left hand (LH) movement and 0.75s for right rand (RH) movement. The time interval between two audio signals was selected randomly in the range 4-5s for each trial. The sequence of motor tasks was randomized and the pause between tasks was also chosen randomly in the range 6-8s to exclude possible training or motor-preparation effects caused by the sequential execution of the same tasks.Acquired EEG signals were then processed via preprocessing tools implemented in MNE Python package. Specifically, raw EEG signals were filtered by a Butterworth 5th order filter in the range 1-100 Hz, and by 50Hz Notch filter. Further, Independent Component Analysis (ICA) was applied to remove ocular and cardiac artifacts. Artifact-free EEG recordings were then segmented into 60 epochs according to the experimental protocol. Each epoch was 14s long, including 3s of baseline and 11s of motor-related brain activity, and time-locked to the first audio command indicating the start of motor execution. After visual inspection epochs that still contained artifacts were rejected. Finally, 15 epochs per movement type were stored for each subject.Individual epochs for each subject are stored in the attached MNE .fif files. Prefix EA or YA in the name of the file identifies the age group, which subject belongs to. Postfix LH or RH in the name of the file indicates the type of motor tasks.EEG signals were acquired from 20 healthy right-handed subjects performing a series of fine motor tasks cued by the audio command. The participants were divided equally into two distinct age groups: (i) 10 elderly adults (EA group, aged 55-72, 6 females); (ii) 10 young adults (YA group, aged 19-33, 3 females).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EEG signals of various subjects in text files are uploaded. It can be useful for various EEG signal processing algorithms- filtering, linear prediction, abnormality detection, PCA, ICA etc.