93 datasets found

i
Cora
ieee-dataport.org
paperswithcode.com
+1more
Updated Mar 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sepideh Neshatfar (2024). Cora [Dataset]. https://ieee-dataport.org/documents/cora
Explore at:
Dataset updated
Mar 11, 2024
Authors
Sepideh Neshatfar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.
i
Social network datasets and citation network datasets
ieee-dataport.org
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CHEN Xiaocong (2025). Social network datasets and citation network datasets [Dataset]. https://ieee-dataport.org/documents/social-network-datasets-and-citation-network-datasets
Explore at:
Dataset updated
Apr 30, 2025
Authors
CHEN Xiaocong
Description
online communities
IEEEPPG Dataset
zenodo.org
bin
Updated Mar 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb (2021). IEEEPPG Dataset [Dataset]. http://doi.org/10.5281/zenodo.3902710
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3902710
Dataset updated
Mar 24, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of the Monash, UEA & UCR time series regression repository. http://tseregression.org/

The goal of this dataset is to estimate heart rate using PPG sensors. This dataset contains 3096, 5 dimensional time series obtained from the IEEE Signal Processing Cup 2015: Heart Rate Monitoring During Physical Exercise Using Wrist-Type Photoplethysmographic (PPG) Signals. Two-channel PPG signals, three-axis acceleration signals, and one-channel ECG signals were simultaneously recorded from subjects with age from 18 to 35. For each subject, the PPG signals were recorded from wrist by two pulse oximeters with green LEDs (wavelength: 515nm). Their distance (from center to center) was 2 cm. The acceleration signal was also recorded from wrist by a three-axis accelerometer. Both the pulse oximeter and the accelerometer were embedded in a wristband, which was comfortably worn. The ECG signal was recorded simultaneously from the chest using wet ECG sensors. All signals were sampled at 125 Hz and sent to a nearby computer via Bluetooth.

Please refer to https://sites.google.com/site/researchbyzhang/ieeespcup2015 for more details.

Copyright
All datasets have copyrights. But you can freely use them for the Signal Processing Cup or your own academic research, as long as you suitably cite the data in your works.

Citation request
Z. Zhang, Z. Pi, B. Liu, TROIKA: A general framework for heart rate monitoring using wrist-type photoplethysmographic signals during intensive physical exercise, IEEE Transactions on Biomedical Engineering, vol. 62, no. 2, pp. 522-531, February 2015, DOI: 10.1109/TBME.2014.2359372
o
Academic Visualisation Publications Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Academic Visualisation Publications Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/6609c2a9-39d6-4e32-8f5e-5591efed9c5e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
Area covered
Knowledge Bundles
Description
This dataset provides detailed information on IEEE Visualization (IEEE VIS) publications from 1990 to 2023, encompassing papers from conferences such as SciVis, InfoVis, VAST, and Vis. It also includes IEEE TVCG and IEEE CG&A articles that were presented at IEEE VIS. The collection offers crucial details such as paper titles, authors, Digital Object Identifiers (DOIs), abstracts, and direct links to the papers. Additionally, it lists citations to other previous VIS papers, making it an invaluable resource for analysing trends, understanding citation networks, and exploring the evolution of visualisation research. The dataset is designed to provide analytical tools for uncovering relevant patterns and to assist with navigating content within extensive text collections.

Columns

Conference: The short title of the conference in which the paper appeared, such as SciVis, InfoVis, VAST, or Vis.

Year: The year the paper was presented at the conference. This may differ from the actual publication year.

Title: The title of the paper.

DOI: The Digital Object Identifier for the paper. If a valid DOI was not available, a syntactically valid but fictitious DOI, beginning with "10.0000", was entered, serving as a unique identifier for each paper.

Link: A direct link to the paper within the IEEE digital library, derived from the DOI.

FirstPage: The first page number of the paper in the printed proceedings. This information has not been thoroughly checked.

LastPage: The last page number of the paper in the printed proceedings. This information has not been thoroughly checked and may occasionally include a second page number, particularly for older years where separate colour pages for pictures were included.

PaperType: Indicates the type of paper: 'C' for conference paper, 'J' for journal paper, or 'M' for miscellaneous. To filter for scientific research articles, 'C' and 'J' types are relevant. Papers marked with 'M' are not typically considered archival publications.

Abstract: A summary of the paper's content.

AuthorNames-Deduped: A list of co-authors, with names separated by semicolons. Authors are listed in the order they appear on the paper, using de-duped full author names provided by DBLP.

Distribution

The dataset is typically provided in a Google spreadsheet format, which can be downloaded as a CSV file or other formats. It contains information on 3,753 unique DOIs and covers a substantial period from 1990 to 2023. Based on paper type, approximately 50% are Journal (J) papers, 42% are Conference (C) papers, and 8% fall into other categories. Regarding conference representation, Vis papers constitute 48% of the collection, InfoVis papers 24%, and other conferences 28%. The data is organised across multiple tabs: the main tab contains IEEE VIS papers, a second tab details journal papers presented at VIS, and a third tab provides an overview of paper submission numbers and acceptance rates across the years.

Usage

This dataset is ideally suited for various analytical and research purposes, including: * Analysing trends and patterns within the IEEE Visualization publication landscape. * Developing and evaluating text mining techniques for academic literature. * Exploring citation relationships and the impact of research within the visualisation community. * Filtering for specific types of scientific research articles (conference or journal papers). * Investigating conference submission volumes and acceptance rates over time. * Aiding with content navigation and discovery within large collections of academic texts.

Coverage

The dataset spans a significant time range from 1990 to 2023, offering a historical perspective on research in visualisation. While it does not include specific geographic demographics, the scope of IEEE VIS publications is considered global. The dataset specifically covers publications from IEEE Visualization (VIS) conferences, as well as IEEE TVCG and IEEE CG&A articles presented at VIS. It is structured to allow for clear distinction between different paper categories and provides a summary of yearly publication statistics.

License

CC-BY-NC

Who Can Use It

Academic researchers and scholars focused on bibliometrics, publication trends, and the history of visualisation.

Data scientists and machine learning practitioners interested in natural language processing (NLP) and text analysis.

Students undertaking academic projects or literature reviews related to computer graphics and visualisation.

Conference organisers seeking insights into academic publishing patterns and acceptance statistics.

Developers creating tools for academic search, data exploration, or content analysis.

Dataset Name Suggestions

IEEE VIS Publication Data

Visual
Z
WiFi RSS & RTT dataset with different LOS conditions for indoor positioning
data.niaid.nih.gov
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luo, Zhiyuan (2024). WiFi RSS & RTT dataset with different LOS conditions for indoor positioning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11558791
Explore at:
Dataset updated
Jun 11, 2024
Dataset provided by
Feng, Xu
Luo, Zhiyuan
Nguyen, Khuong An
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second batch of WiFi RSS RTT datasets with LOS conditions we published. Please see https://doi.org/10.5281/zenodo.11558192 for the first release.

We provide three real-world datasets for indoor positioning model selection purpose. We divided the area of interest was divided into discrete grids and labelled them with correct ground truth coordinates and the LoS APs from the grid. The dataset contains WiFi RTT and RSS signal measures and is well separated so that training points and testing points will not overlap. Please find the datasets in the 'data' folder. The datasets contain both WiFi RSS and RTT signal measures with groud truth coordinates label and LOS condition label.

Lecture theatre: This is a entirely LOS scenario with 5 APs. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).

Corridor: This is a entirely NLOS scenario with 4 APs. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).

Office: This is a mixed LOS-NLOS scenario with 5 APs. At least one AP was NLOS for each RP. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).

Collection methodology

The APs utilised were Google WiFi Router AC-1304, the smartphone used to collect the data was Google Pixel 3 with Android 9.

The ground truth coordinates were collected using fixed tile size on the floor and manual post-it note markers.

Only RTT-enabled APs were included in the dataset.

The features of the dataset

The features of the lecture theatre dataset are as follows:

Testbed area: 15 × 14.5 m2 Grid size: 0.6 × 0.6 m2Number of AP: 5 Number of reference points: 120 Samples per reference point: 60 Number of all data samples: 7,200 Number of training samples: 5,400 Number of testing samples: 1,800 Signal measure: WiFi RTT, WiFi RSS Note: Entirely LOS

The features of the corricor dataset are as follows:

Testbed area: 35 × 6 m2 Grid size: 0.6 × 0.6 m2Number of AP: 4 Number of reference points: 114 Samples per reference point: 60 Number of all data samples: 6,840 Number of training samples: 5,130 Number of testing samples: 1,710 Signal measure: WiFi RTT, WiFi RSS Note: Miexed LOS-NLOS. At least one AP was NLOS for each RP.

The features of the office dataset are as follows:

Testbed area: 18 × 5.5 m2 Grid size: 0.6 × 0.6 m2Number of AP: 5 Number of reference points: 108 Samples per reference point: 60 Number of all data samples: 6,480 Number of training samples: 4,860 Number of testing samples: 1,620 Signal measure: WiFi RTT, WiFi RSS Note: Entirely NLOS

Dataset explanation

The columns of the dataset are as follows:

Column 'X': the X coordinates of the sample. Column 'Y': the Y coordinates of the sample. Column 'AP1 RTT(mm)', 'AP2 RTT(mm)', ..., 'AP5 RTT(mm)': the RTT measure from corresponding AP at a reference point. Column 'AP1 RSS(dBm)', 'AP2 RSS(dBm)', ..., 'AP5 RSS(dBm)': the RSS measure from corresponding AP at a reference point. Column 'LOS APs': indicating which AP has a LOS to this reference point.

Please note:

The RSS value -200 dBm indicates that the AP is too far away from the current reference point and no signals could be heard from it.

The RTT value 100,000 mm indicates that no signal is received from the specific AP.

Citation request

When using this dataset, please cite the following three items:

Feng, X., Nguyen, K. A., & Zhiyuan, L. (2024). WiFi RSS & RTT dataset with different LOS conditions for indoor positioning [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11558792

@article{feng2024wifi, title={A WiFi RSS-RTT indoor positioning system using dynamic model switching algorithm}, author={Feng, Xu and Nguyen, Khuong An and Luo, Zhiyuan}, journal={IEEE Journal of Indoor and Seamless Positioning and Navigation}, year={2024}, publisher={IEEE} }@inproceedings{feng2023dynamic, title={A dynamic model switching algorithm for WiFi fingerprinting indoor positioning}, author={Feng, Xu and Nguyen, Khuong An and Luo, Zhiyuan}, booktitle={2023 13th International Conference on Indoor Positioning and Indoor Navigation (IPIN)}, pages={1--6}, year={2023}, organization={IEEE} }
EEG Dataset for ADHD
kaggle.com
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danizo (2025). EEG Dataset for ADHD [Dataset]. https://www.kaggle.com/datasets/danizo/eeg-dataset-for-adhd/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Danizo
Description
This is the Dataset Collected by Shahed Univeristy Released in IEEE.

the Columns are: Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2, Class, ID

the first 19 are channel names.

Class: ADHD/Control

ID: Patient ID

Participants were 61 children with ADHD and 60 healthy controls (boys and girls, ages 7-12). The ADHD children were diagnosed by an experienced psychiatrist to DSM-IV criteria, and have taken Ritalin for up to 6 months. None of the children in the control group had a history of psychiatric disorders, epilepsy, or any report of high-risk behaviors.

EEG recording was performed based on 10-20 standard by 19 channels (Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2) at 128 Hz sampling frequency. The A1 and A2 electrodes were the references located on earlobes.

Since one of the deficits in ADHD children is visual attention, the EEG recording protocol was based on visual attention tasks. In the task, a set of pictures of cartoon characters was shown to the children and they were asked to count the characters. The number of characters in each image was randomly selected between 5 and 16, and the size of the pictures was large enough to be easily visible and countable by children. To have a continuous stimulus during the signal recording, each image was displayed immediately and uninterrupted after the child’s response. Thus, the duration of EEG recording throughout this cognitive visual task was dependent on the child’s performance (i.e. response speed).

Citation Author(s): Ali Motie Nasrabadi Armin Allahverdy Mehdi Samavati Mohammad Reza Mohammadi

DOI: 10.21227/rzfh-zn36

License: Creative Commons Attribution
COTIDIANA Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Matias; Pedro Matias; Ricardo Araújo; Ricardo Araújo; Ricardo Graça; Ricardo Graça; Ana Rita Henriques; Ana Rita Henriques; David Belo; David Belo; Maria Valada; Maria Valada; Nasim Nakhost Lotfi; Elsa Frazão Mateus; Elsa Frazão Mateus; Helga Radner; Helga Radner; Ana M. Rodrigues; Ana M. Rodrigues; Paul Studenic; Paul Studenic; Francisco Nunes; Francisco Nunes; Nasim Nakhost Lotfi (2024). COTIDIANA Dataset [Dataset]. http://doi.org/10.5281/zenodo.13628911
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13628911
Dataset updated
Nov 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pedro Matias; Pedro Matias; Ricardo Araújo; Ricardo Araújo; Ricardo Graça; Ricardo Graça; Ana Rita Henriques; Ana Rita Henriques; David Belo; David Belo; Maria Valada; Maria Valada; Nasim Nakhost Lotfi; Elsa Frazão Mateus; Elsa Frazão Mateus; Helga Radner; Helga Radner; Ana M. Rodrigues; Ana M. Rodrigues; Paul Studenic; Paul Studenic; Francisco Nunes; Francisco Nunes; Nasim Nakhost Lotfi
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
About

The COTIDIANA Dataset is a holistic, multimodal, and multidimensional dataset that captures three dimensions in which patients are frequently impacted by Rheumatic and Musculoskeletal Diseases (RMDs), namely, (a) mobility and physical activity, due to joint stiffness, fatigue, or pain; (b) finger dexterity, due to finger joint stiffness or pain; or (c) mental health (anxiety/depression level), due to the functional impairments or pain.

We release this dataset to facilitate research in rheumatology, while contributing to the characterisation of RMD patients using smartphone-based sensor and log data.

We gathered smartphone and self-reported data from 31 patients with RMDs and 28 age-matched controls, including (i) inertial sensors, (ii) keyboard metrics, (iii) communication logs, and (iv) reference tests/scales. We provide both raw and (pre-)processed dataset versions, to enable researchers or developers to use their own methods or benefit from the computed variables. Additional materials containing (a) illustrations, (b) visualization charts, and (c) variable descriptions can be consulted through this link.

Citing

When using this dataset, please cite P. Matias, R. Araújo, R. Graça, A. R. Henriques, D. Belo, M. Valada, N. N. Lotfi, E. Frazão Mateus, H. Radner, A. M. Rodrigues, P. Studenic, F. Nunes (2024) COTIDIANA Dataset – Smartphone-Collected Data on the Mobility, Finger Dexterity, and Mental Health of People With Rheumatic and Musculoskeletal Diseases, in IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 11, pp. 6538-6547, DOI: 10.1109/JBHI.2024.3456069.

Data structure

The data is organised by participant and includes:

Inertial Sensor Data, retrieved from accelerometer, gyroscope, and magnetometer sensors collected during three distinct walking exercises (Timed Up and Go, Daily Living Activity, and Simple Walk);

Keyboard Dynamic Metrics, collecting 38 raw variables related with the keyboard typing performance while writing 10 sentences (e.g., number of errors, words-per-minute);

Communication Logs, e.g., with weekly averages of number of calls and SMS sent or received;

Validated Clinical Questionnaires, such as general Health (EQ-5D-5L), Multidimensional Health Assessment Questionnaire (MDHAQ), Hospital Anxiety and Depression Scale (HADS);

Validated Functional Tests, including time to perform the Timed Up and Go (TUG) and Moberg Pick-Up Test (fine motor skills);

Characterization Questionnaire, containing sociodemographic and clinical information.

cotidiana_dataset
├── info
│ ├── codebook.xlsx
│ ├── missings_report.csv
├── processed
│ ├── com_calls
│ │ └── features.csv
│ ├── com_sms
│ │ └── features.csv
│ ├── full
│ │ └── cotidiana_dataset.csv
│ ├── hd_kst
│ │ └── features.csv
│ ├── hd_mpu
│ │ └── features.csv
│ ├── mob_dla
│ │ └── features.csv
│ ├── mob_sw
│ │ └── features.csv
│ ├── mob_tug
│ │ └── features.csv
│ ├── quest
│ └── features.csv
├── raw
│ ├── com_calls
│ │ └── p[0-58]
│ │ └── calls_log.csv
│ ├── com_sms
│ │ └── p[0-58]
│ │ └── sms_log.csv
│ ├── hd_kst
│ │ └── p[0-58]
│ │ ├── imu
│ │ │ ├── Accelerometer_s[0-9].csv
│ │ │ ├── Gyroscope_s[0-9].csv
│ │ │ └── Magnetometer_s[0-9].csv
│ │ └── keyboard
│ │ └── kb_metrics.csv
│ ├── hd_mpu
│ │ └── p[0-58]
│ │ └── mpu_time.csv
│ ├── mob_dla
│ │ └── p[0-58]
│ │ ├── bag
│ │ │ ├── Accelerometer.csv
│ │ │ ├── Gyroscope.csv
│ │ │ ├── Magnetometer.csv
│ │ │ └── Annotation.csv
│ │ └── pocket
│ │ ├── Accelerometer.csv
│ │ ├── Gyroscope.csv
│ │ ├── Magnetometer.csv
│ │ └── Annotation.csv
│ ├── mob_sw
│ │ └── p[0-58]
│ │ ├── ann
│ │ │ └── walk_ann.csv
│ │ ├── bag
│ │ │ ├── Accelerometer.csv
│ │ │ ├── Gyroscope.csv
│ │ │ ├── Magnetometer.csv
│ │ │ └── Annotation.csv
│ │ └── pocket
│ │ ├── Accelerometer.csv
│ │ ├── Gyroscope.csv
│ │ ├── Magnetometer.csv
│ │ └── Annotation.csv
│ ├── mob_tug
│ │ └── p[0-58]
│ │ ├── bag
│ │ │ ├── Accelerometer.csv
│ │ │ ├── Gyroscope.csv
│ │ │ ├── Magnetometer.csv
│ │ │ └── Annotation.csv
│ │ └── pocket
│ │ ├── Accelerometer.csv
│ │ ├── Gyroscope.csv
│ │ ├── Magnetometer.csv
│ │ └── Annotation.csv
│ ├── quest
│ └── features.csv
Z
Data from: Shape Completion with Prediction of Uncertain Regions
data.niaid.nih.gov
Updated Jan 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Humt, Matthias (2024). Shape Completion with Prediction of Uncertain Regions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10284229
Explore at:
Dataset updated
Jan 10, 2024
Dataset authored and provided by
Humt, Matthias
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview

This is the dataset for the publication Shape Completion with Prediction of Uncertain Regions [IEEE][arXiv].

The dataset contains rendered depth images, watertight meshes and occupancy as well as uncertain region labels for the mugs category (03797390) of the ShapeNetCore.v1 dataset.

It further contains optimized, watertight meshes and occupancy as well as uncertain region labels for the mugs found in the HB, LM, TYOL and YCBV datasets from the BOP challenge.

After downloading the shapenet.tar.gz and bop.tar.gz files as well as the original datasets, simply unpack and move the content to the corresponding directories.

Code

Python code for loading of this dataset is provided at the official GitHub repository.

MD5 checksums

bop.tar.gz: a0a54631939fef360482aafcccca77a4

shapenet.tar.gz: 4c96dde625dc7662ad6eacf6f129c55c

Citation

If you find the provided dataset useful in your research, please use the following BibTex entry to cite the corresponding research paper:

@inproceedings{humt2023uncertain, author={Humt, Matthias and Winkelbauer, Dominik and Hillenbrand, Ulrich}, booktitle={2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, title={Shape Completion with Prediction of Uncertain Regions}, year={2023}, pages={1215-1221}, doi={10.1109/IROS55552.2023.10342487}}
Z
Reference datasets for in-flight emergency situations
data.niaid.nih.gov
zenodo.org
Updated Jul 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vincent Lenders (2020). Reference datasets for in-flight emergency situations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3937482
Explore at:
Dataset updated
Jul 10, 2020
Dataset provided by
Allan Tart
Xavier Olive
Martin Strohmeier
Matthias Schäfer
Vincent Lenders
Ivan Martinovic
Axel Tanner
Metin Feridun
Description
Motivation

The data in this dataset is derived and cleaned from the full OpenSky dataset in order to illustrate in-flight emergency situations triggering the 7700 transponder code. It spans flights seen by the network's more than 2500 members between 1 January 2018 and 29 January 2020.

The dataset complements the following publication:

Xavier Olive, Axel Tanner, Martin Strohmeier, Matthias Schäfer, Metin Feridun, Allan Tart, Ivan Martinovic and Vincent Lenders. "OpenSky Report 2020: Analysing in-flight emergencies using big data". In 2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC), October 2020

License

See LICENSE.txt

Disclaimer

The data provided in the files is provided as is. Despite our best efforts at filtering out potential issues, some information could be erroneous.

Most aircraft information come from the OpenSky aircraft database and have been filled with manual research from various sources on the Internet. Most information about flight plans has been automatically fetched and processed using open APIs; some manual processing was required to cross-check, correct erroneous and fill missing information.

Description of the dataset

Two files are provided in the dataset:

one compressed parquet file with trajectory information;

one metadata CSV file with the following features:

flight_id: a unique identifier for each trajectory;

callsign: ICAO flight callsign information;

number: IATA flight number, when available;

icao24, registration, typecode: information about the aircraft;

origin: the origin airport for the aircraft, when available;

landing: the airport where the aircraft actually landed, when available;

destination: the intended destination airport, when available;

diverted: the diversion airport, if applicable, when available;

tweet_problem, tweet_result, tweet_fueldump: information extracted from Twitter accounts, about the nature of the issue, the consequence of the emergency and whether the aircraft is known to have dumped fuel;

avh_id, avh_problem, avh_result, avh_fueldump: information extracted from The Aviation Herald, about the nature of the issue, the consequence of the emergency and whether the aircraft is known to have dumped fuel. The complete URL for each event is https://avherald.com/h?article={avh_id}&opt=1 (replace avh_id by the actual value)

Examples

Additional analyses and visualisations of the data are available at the following page:

Credit

If you use this dataset, please cite the original OpenSky paper:

Xavier Olive, Axel Tanner, Martin Strohmeier, Matthias Schäfer, Metin Feridun, Allan Tart, Ivan Martinovic and Vincent Lenders. "OpenSky Report 2020: Analysing in-flight emergencies using big data". In 2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC), October 2020

Matthias Schäfer, Martin Strohmeier, Vincent Lenders, Ivan Martinovic and Matthias Wilhelm. "Bringing Up OpenSky: A Large-scale ADS-B Sensor Network for Research". In Proceedings of the 13th IEEE/ACM International Symposium on Information Processing in Sensor Networks (IPSN), pages 83-94, April 2014.

and the traffic library used to derive the data:

Xavier Olive. "traffic, a toolbox for processing and analysing air traffic data." Journal of Open Source Software 4(39), July 2019.
E
A Replication Dataset for Fundamental Frequency Estimation
live.european-language-grid.eu
data.niaid.nih.gov
+1more
json
Updated Oct 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). A Replication Dataset for Fundamental Frequency Estimation [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7808
Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods.© 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating this quantity have been proposed over the years, and a number of speech and noise corpora have been developed for evaluating their performance. The present dataset contains estimated fundamental frequency tracks of 25 algorithms, six speech corpora, two noise corpora, at nine signal-to-noise ratios between -20 and 20 dB SNR, as well as an additional evaluation of synthetic harmonic tone complexes in white noise.The dataset also contains pre-calculated performance measures both novel and traditional, in reference to each speech corpus’ ground truth, the algorithms’ own clean-speech estimate, and our own consensus truth. It can thus serve as the basis for a comparison study, or to replicate existing studies from a larger dataset, or as a reference for developing new fundamental frequency estimation algorithms. All source code and data is available to download, and entirely reproducible, albeit requiring about one year of processor-time.Included Code and Data
ground truth data.zip is a JBOF dataset of fundamental frequency estimates and ground truths of all speech files in the following corpora:
CMU-ARCTIC (consensus truth) [1]FDA (corpus truth and consensus truth) [2]KEELE (corpus truth and consensus truth) [3]MOCHA-TIMIT (consensus truth) [4]PTDB-TUG (corpus truth and consensus truth) [5]TIMIT (consensus truth) [6]
noisy speech data.zip is a JBOF datasets of fundamental frequency estimates of speech files mixed with noise from the following corpora:NOISEX [7]QUT-NOISE [8]
synthetic speech data.zip is a JBOF dataset of fundamental frequency estimates of synthetic harmonic tone complexes in white noise.noisy_speech.pkl and synthetic_speech.pkl are pickled Pandas dataframes of performance metrics derived from the above data for the following list of fundamental frequency estimation algorithms:AUTOC [9]AMDF [10]BANA [11]CEP [12]CREPE [13]DIO [14]DNN [15]KALDI [16]MAPSMBSC [17]NLS [18]PEFAC [19]PRAAT [20]RAPT [21]SACC [22]SAFE [23]SHR [24]SIFT [25]SRH [26]STRAIGHT [27]SWIPE [28]YAAPT [29]YIN [30]
noisy speech evaluation.py and synthetic speech evaluation.py are Python programs to calculate the above Pandas dataframes from the above JBOF datasets. They calculate the following performance measures:Gross Pitch Error (GPE), the percentage of pitches where the estimated pitch deviates from the true pitch by more than 20%.Fine Pitch Error (FPE), the mean error of grossly correct estimates.High/Low Octave Pitch Error (OPE), the percentage pitches that are GPEs and happens to be at an integer multiple of the true pitch.Gross Remaining Error (GRE), the percentage of pitches that are GPEs but not OPEs.Fine Remaining Bias (FRB), the median error of GREs.True Positive Rate (TPR), the percentage of true positive voicing estimates.False Positive Rate (FPR), the percentage of false positive voicing estimates.False Negative Rate (FNR), the percentage of false negative voicing estimates.F₁, the harmonic mean of precision and recall of the voicing decision.
Pipfile is a pipenv-compatible pipfile for installing all prerequisites necessary for running the above Python programs.
The Python programs take about an hour to compute on a fast 2019 computer, and require at least 32 Gb of memory.References:
John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.Man Mohan Sondhi. New methods of pitch extraction. Audio and Electroacoustics, IEEE Transactions on, 16(2):262—266, 1968.Myron J. Ross, Harry L. Shaffer, Asaf Cohen, Richard Freudberg, and Harold J. Manley. Average magnitude difference function pitch extractor. Acoustics, Speech and Signal Processing, IEEE Transactions on, 22(5):353—362, 1974.Na Yang, He Ba, Weiyang Cai, Ilker Demirkol, and Wendi Heinzelman. BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):1833–1848, December 2014.Michael Noll. Cepstrum Pitch Determination. The Journal of the Acoustical Society of America, 41(2):293–309, 1967.Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello. CREPE: A Convolutional Representation for Pitch Estimation. arXiv:1802.06182 [cs, eess, stat], February 2018. arXiv: 1802.06182.Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications. IEICE Transactions on Information and Systems, E99.D(7):1877–1884, 2016.Kun Han and DeLiang Wang. Neural Network Based Pitch Tracking in Very Noisy Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):2158–2168, Decem- ber 2014.Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal, and Sanjeev Khudanpur. A pitch extraction algorithm tuned for automatic speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 2494–2498. IEEE, 2014.Lee Ngee Tan and Abeer Alwan. Multi-band summary correlogram-based pitch detection for noisy speech. Speech Communication, 55(7-8):841–856, September 2013.Jesper Kjær Nielsen, Tobias Lindstrøm Jensen, Jesper Rindom Jensen, Mads Græsbøll Christensen, and Søren Holdt Jensen. Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient. Signal Processing, 135:188–197, June 2017.Sira Gonzalez and Mike Brookes. PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2):518—530, February 2014.Paul Boersma. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, volume 17, page 97—110. Amsterdam, 1993.David Talkin. A robust algorithm for pitch tracking (RAPT). Speech coding and synthesis, 495:518, 1995.Byung Suk Lee and Daniel PW Ellis. Noise robust pitch tracking by subband autocorrelation classification. In Interspeech, pages 707–710, 2012.Wei Chu and Abeer Alwan. SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech. In INTERSPEECH, pages 2590–2593, 2010.Xuejing Sun. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, volume 1, page I—333. IEEE, 2002.Markel. The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20(5):367—377, December 1972.Thomas Drugman and Abeer Alwan. Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. In Interspeech, page 1973—1976, 2011.Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuichi Nisimura, Toshio Irino, and Hideki Banno. TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In Acous- tics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3933–3936. IEEE, 2008.Arturo Camacho. SWIPE: A sawtooth waveform inspired pitch estimator for speech and music. PhD thesis, University of Florida, 2007.Kavita Kasi and Stephen A. Zahorian. Yet Another Algorithm for Pitch Tracking. In IEEE International Conference on Acoustics Speech and Signal Processing, pages I–361–I–364, Orlando, FL, USA, May 2002. IEEE.Alain de Cheveigné and Hideki Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917, 2002.
i
IMDb Movie Reviews Dataset
ieee-dataport.org
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Pal (2022). IMDb Movie Reviews Dataset [Dataset]. https://ieee-dataport.org/open-access/imdb-movie-reviews-dataset
Explore at:
Dataset updated
Aug 2, 2022
Authors
Aditya Pal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R
ASAYAR: A Dataset for Arabic-Latin Text Detection
kaggle.com
Updated Feb 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed AKALLOUCH (2022). ASAYAR: A Dataset for Arabic-Latin Text Detection [Dataset]. https://www.kaggle.com/datasets/akallouch/asayar
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 4, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohammed AKALLOUCH
Description
ASAYAR

This is a description for the paper:
ASAYAR: A Dataset for Arabic-Latin Scene Text Localization in Highway Traffic Panels
Mohammed Akallouch; Kaoutar Sefrioui Boujemaa; Afaf Bouhoute; Khalid Fardousse; Ismail Berrada

Overview

ASAYAR is the first public dataset dedicated to Latin (French) and Arabic Scene Text Detection in Highway panels. It comprises more than 1800 well-annotated images. The dataset was collected from Moroccan Highway,## ASAYAR

Annotation format

In the dataset, each instance's location is annotated by a rectangle bounding box. The bounding box can be denoted as :
{XMIN, YMIN, XMAX, YMAX}. An object has a class name denoted as CLASS. The global image information is defined as follows: FOLDER, PATH, NAME, and SIZE.

Dataset structure

Train or Test/ ├── ASAYAR_SIGN/ │ ├── Annotations/ │ │ ├── image_1.xml │ │ └── ... │ └── Images │ ├── image_1.png │ └── ... │ ├── ASAYAR_TXT/ │ ├── Annotations/ │ │ ├── Line-Level/ │ │ │ ├── image_1.xml │ │ │ └── ... │ │ └── Word-Level/ │ │ ├── image_1.xml │ │ └── ... │ └── Images/ │ ├── image_1.png │ └── ... └── ASAYAR_SYM/ ├── Annotations/ │ ├── image_1.xml │ └── ... └── Images/ ├── image_1.png └── ...

Import data

We provide a Jupyter Notebook with an example to import images and their annotations.

Convert to text format

To convert annotations from Voc pascal to txt format (xmin,ymin,xmax,ymax,class) use convert2txt.py.

Examples of Annotated Images

https://vcar.github.io/ASAYAR/images/image_895.png">

Website

The data website: ASAYAR

Citation

Our paper introducing the dataset and the evaluations methods is published at the IEEE Transactions on Intelligent Transportation Systems 2020 and is available here. If you make use of the ASAYAR dataset, please cite our following paper:

@ARTICLE{9233923, author={M. {Akallouch} and K. S. {Boujemaa} and A. {Bouhoute} and K. {Fardousse} and I. {Berrada}}, journal={IEEE Transactions on Intelligent Transportation Systems}, title={ASAYAR: A Dataset for Arabic-Latin Scene Text Localization in Highway Traffic Panels}, year={2020}, pages={1-11}, doi={10.1109/TITS.2020.3029451}}
Data from: Reference Measurements of Error Vector Magnitude
catalog.data.gov
data.nist.gov
Updated Jul 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Reference Measurements of Error Vector Magnitude [Dataset]. https://catalog.data.gov/dataset/reference-measurements-of-error-vector-magnitude
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The experiment here was to demonstrate that we can reliably measure the Reference Waveforms designed in the IEEE P1765 proposed standard and calculate EVM along with the associated uncertainties. The measurements were performed using NIST's calibrated sampling oscilloscope and were traceable to the primary standards.We have uploaded the following two datasets. (1) Table 3 contains the EVM values (in %) for the Reference Waveforms 1--7 after performing the uncertainty analyses. The Monte Carlo means are also compared with the ideal values from the calculations in the IEEE P1765 standard.(2) Figure 3 shows the complete EVM distribution upon performing uncertainty analysis for Reference Waveform 3 as an example. Each of the entries in Table 3 is associated with an EVM distribution similar to that shown in Fig. 3.
i
CWRU bearing dataset and Gearbox dataset of IEEE PHM Challenge Competition...
ieee-dataport.org
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenxiang Li (2019). CWRU bearing dataset and Gearbox dataset of IEEE PHM Challenge Competition in 2009 [Dataset]. https://ieee-dataport.org/documents/cwru-bearing-dataset-and-gearbox-dataset-ieee-phm-challenge-competition-2009
Explore at:
Dataset updated
Nov 20, 2019
Authors
Zhenxiang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The bearing dataset is acquired by the electrical engineering laboratory of Case Western Reserve University and published on the Bearing Data Center Website. The gearbox dataset is from IEEE PHM Challenge Competition in 2009
R
Usk Coffe Segmentasi Dataset
universe.roboflow.com
zip
Updated Nov 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yolo Annotated Dataset (2023). Usk Coffe Segmentasi Dataset [Dataset]. https://universe.roboflow.com/yolo-annotated-dataset/usk-coffe-segmentasi
Explore at:
zipAvailable download formats
Dataset updated
Nov 24, 2023
Dataset authored and provided by
Yolo Annotated Dataset
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Variables measured
Coffe Babo BD4S Polygons
Description
Dataset Annotation is a critical stage in the development of object detection systems, especially when working with datasets like USK-Coffee, which focuses on green Arabica coffee beans. Annotation is carried out to provide crucial information to the object detection model, enabling it to recognize and classify coffee beans based on their varieties. The following is a description of the USK-Coffee dataset annotation:

Annotation Types: 1. Bounding Box: Each coffee seed in the image will be annotated with a bounding box surrounding it. This includes peaberry, longberry, premium, and defective beans. 2. Class Labels: Each bounding box will be assigned a class label corresponding to the type of coffee bean it represents, such as defect, longberry, peaberry, premium.

Annotation Process: 1. Manual Annotation: Annotation will be carried out manually by competent annotators, namely Imam Sayuti and Patimah Lubis. They will mark the position of each coffee bean in the image and assign the appropriate class label. 2. Quality Control: The annotation process will involve quality control by Imam Sayuti, Patimah Lubis, and Kahlil Muchtar (as the project's supervising lecturer) to ensure that each coffee bean is correctly identified and labeled.

Dataset File Framework: USK-Coffee |-- test | |-- defect (400 Images) | |-- longberry (400 Images) | |-- peaberry (400 Images) | |-- premium (400 Images)

|-- train | |-- defect (1200 Images) | |-- longberry (1200 Images) | |-- peaberry (1200 Images) | |-- premium (1200 Images)

|-- val | |-- defect (400 Images) | |-- longberry (400 Images) | |-- peaberry (400 Images) | |-- premium (400 Images)

Dataset Source Citation: The USK-Coffee dataset was announced at the IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom) in 2022 by Febriana, A., Muchtar, K., Dawood, R., and Lin, CY. This dataset has made a valuable contribution, allowing for further research in deep learning and the development of object detection systems using YOLO. The dataset source citation is as follows:

A. Febriana, K. Muchtar, R. Dawood and C. -Y. Lin, "USK-COFFEE Dataset: A Multi-Class Green Arabica Coffee Bean Dataset for Deep Learning," 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), Malang, Indonesia, 2022, pp. 469-473, doi: 10.1109/CyberneticsCom55287.2022.9865489. (pdf)

With meticulous dataset annotation by Imam Sayuti, Patimah Lubis, and oversight by Kahlil Muchtar, it is expected that research utilizing this dataset can achieve accurate and high-performance object detection.
TECNALIA WEEE (Waste from Electrical and Electronic Equipment) HYPERSPECTRAL...
data.niaid.nih.gov
Updated Jan 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tecnalia (2025). TECNALIA WEEE (Waste from Electrical and Electronic Equipment) HYPERSPECTRAL DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12565131
Explore at:
Dataset updated
Jan 19, 2025
Dataset provided by
Tecnalia Research and Innovation
Picon, Artzai
Bereciartua, Arantza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TECNALIA WEEE HYPERSPECTRAL DATASET

We present a dataset containing hyperspectral images of Waste from Electrical and Electronic Equipment (WEEE) scrap. These dataset contains pieces of copper, brass, aluminum, stainless steel and white cooper.Images contain 76 uniformly distributed wave-lengths in the spectral range [415.05 nm, 1008.10 nm].Images were calibrated by using a white reference spectralon pattern and a dark spectralon pattern as depicted on the associated paper.

Dataset content:XXXX.mat file: It contains two variables, 'hyperfile': Contains the hyperspectral data of the image, and 'bands': It contains the reference to each band.XXXX.png file: RGB representation of the imageXXXX_gt.png file: It contains a image that assigns each pixel to a specific class according to the following index table:Dataset classes:- Background: 0- Copper: 1- Brass: 2- Aluminum: 3- Lead: 4 [NOT PRESENT]- Stainless Steel: 5- White_Copper: 6

Creators: Artzai Picon (TECNALIA) Arantza Bereciartua (TECNALIA)

Dataset citation:

Picon, A., Ghita, O., Iriondo, P. M., Bereciartua, A., & Whelan, P. F. (2010, September). Automation of waste recycling using hyperspectral image analysis. In 2010 IEEE 15th Conference on Emerging Technologies & Factory Automation (ETFA 2010) (pp. 1-4). IEEE.

You can get more theoretical information on the dataset and methods used here: Picón, A., Ghita, O., Whelan, P. F., & Iriondo, P. M. (2009). Fuzzy spectral and spatial feature integration for classification of nonferrous materials in hyperspectral data. IEEE Transactions on Industrial Informatics, 5(4), 483-494.

Hyperspectral deep learning methods and code for management of this dataset on:

https://github.com/samtzai/tecnalia_weee_hyperspectral_dataset

Picon, A., Galan, P., Bereciartua-Perez, A., & Benito-del-Valle, L. (2024). On the analysis of adapting deep learning methods to hyperspectral imaging. Use case for WEEE recycling and dataset. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 125665.

https://www.sciencedirect.com/science/article/pii/S1386142524018316
Synthetic Dyslexia Handwriting Dataset (YOLO-Format)
zenodo.org
zip
Updated Feb 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nora Fink; Nora Fink (2025). Synthetic Dyslexia Handwriting Dataset (YOLO-Format) [Dataset]. http://doi.org/10.5281/zenodo.14852659
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14852659
Dataset updated
Feb 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nora Fink; Nora Fink
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description
This synthetic dataset has been generated to facilitate object detection (in YOLO format) for research on dyslexia-related handwriting patterns. It builds upon an original corpus of uppercase and lowercase letters obtained from multiple sources: the NIST Special Database 19 111, the Kaggle dataset “A-Z Handwritten Alphabets in .csv format” 222, as well as handwriting samples from dyslexic primary school children of Seberang Jaya, Penang (Malaysia).

In the original dataset, uppercase letters originated from NIST Special Database 19, while lowercase letters came from the Kaggle dataset curated by S. Patel. Additional images (categorized as Normal, Reversal, and Corrected) were collected and labeled based on handwriting samples of dyslexic and non-dyslexic students, resulting in:

78,275 images labeled as Normal

52,196 images labeled as Reversal

8,029 images labeled as Corrected

Building upon this foundation, the Synthetic Dyslexia Handwriting Dataset presented here was programmatically generated to produce labeled examples suitable for training and validating object detection models. Each synthetic image arranges multiple letters of various classes (Normal, Reversal, Corrected) in a “text line” style on a black background, providing YOLO-compatible .txt annotations that specify bounding boxes for each letter.

Key Points of the Synthetic Generation Process

Letter-Level Source Data
Individual characters were sampled from the original image sets.

Randomized Layout
Letters are randomly assembled into words and lines, ensuring a wide variety of visual arrangements.

Bounding Box Labels
Each character is assigned a bounding box with (x, y, width, height) in YOLO format.

Class Annotations
Classes include 0 = Normal, 1 = Reversal, and 2 = Corrected.

Preservation of Visual Characteristics
Letters retain their key dyslexia-relevant features (e.g., reversals).

Historical References & Credits

If you are using this synthetic dataset or the original Dyslexia Handwriting Dataset, please cite the following papers:

M. S. A. B. Rosli, I. S. Isa, S. A. Ramlan, S. N. Sulaiman and M. I. F. Maruzuki, "Development of CNN Transfer Learning for Dyslexia Handwriting Recognition," 2021 11th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2021, pp. 194–199, doi: 10.1109/ICCSCE52189.2021.9530971.

N. S. L. Seman, I. S. Isa, S. A. Ramlan, W. Li-Chih and M. I. F. Maruzuki, "Notice of Removal: Classification of Handwriting Impairment Using CNN for Potential Dyslexia Symptom," 2021 11th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2021, pp. 188–193, doi: 10.1109/ICCSCE52189.2021.9530989.

Isa, Iza Sazanita. CNN Comparisons Models On Dyslexia Handwriting Classification / Iza Sazanita Isa … [et Al.]. Universiti Teknologi MARA Cawangan Pulau Pinang, 2021.

Isa, I. S., Rahimi, W. N. S., Ramlan, S. A., & Sulaiman, S. N. (2019). Automated detection of dyslexia symptom based on handwriting image for primary school children. Procedia Computer Science, 163, 440–449.

References to Original Data Sources

111 P. J. Grother, “NIST Special Database 19,” NIST, 2016. [Online]. Available:
https://www.nist.gov/srd/nist-special-database-19

222 S. Patel, “A-Z Handwritten Alphabets in .csv format,” Kaggle, 2017. [Online]. Available:
https://www.kaggle.com/sachinpatel21/az-handwritten-alphabets-in-csv-format

Usage & Citation

Researchers and practitioners are encouraged to integrate this synthetic dataset into their computer vision pipelines for tasks such as dyslexia pattern analysis, character recognition, and educational technology development. Please cite the original authors and publications if you utilize this synthetic dataset in your work.

Password Note (Original Data)

The original RAR file was password-protected with the password: WanAsy321. This synthetic dataset, however, is provided openly for streamlined usage.
18SAHField - Datasets - IITA
data.iita.org
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.iita.org (2020). 18SAHField - Datasets - IITA [Dataset]. https://data.iita.org/dataset/18sahfield
Explore at:
Dataset updated
Apr 7, 2020
Dataset provided by
International Institute of Tropical Agriculturehttp://www.iita.org/
Description
18SAHField Citation APA Harvard MLA Vancouver Chicago IEEE CSE AMA NLM Turabian
Z
Low-dose Computed Tomography Perceptual Image Quality Assessment Grand...
data.niaid.nih.gov
zenodo.org
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Wang (2023). Low-dose Computed Tomography Perceptual Image Quality Assessment Grand Challenge Dataset (MICCAI 2023) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7833095
Explore at:
Dataset updated
Jun 9, 2023
Dataset provided by
Scott S. Hsieh
Fabian Wagner
Adam Wang
Andreas Maier
Jongduk Baek
Wonkyeong Lee
Jang-Hwan Choi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Image quality assessment (IQA) is extremely important in computed tomography (CT) imaging, since it facilitates the optimization of radiation dose and the development of novel algorithms in medical imaging, such as restoration. In addition, since an excessive dose of radiation can cause harmful effects in patients, generating high- quality images from low-dose images is a popular topic in the medical domain. However, even though peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are the most widely used evaluation metrics for these algorithms, their correlation with radiologists’ opinion of the image quality has been proven to be insufficient in previous studies, since they calculate the image score based on numeric pixel values (1-3). In addition, the need for pristine reference images to calculate these metrics makes them ineffective in real clinical environments, considering that pristine, high-quality images are often impossible to obtain due to the risk posed to patients as a result of radiation dosage. To overcome these limitations, several studies have aimed to develop a no-reference novel image quality metric that correlates well with radiologists’ opinion on image quality without any reference images (2, 4, 5).

Nevertheless, due to the lack of open-source datasets specifically for CT IQA, experiments have been conducted with datasets that differ from each other, rendering their results incomparable and introducing difficulties in determining a standard image quality metric for CT imaging. Besides, unlike real low-dose CT images with quality degradation due to various combinations of artifacts, most studies are conducted with only one type of artifact (e.g., low-dose noise (6-11), view aliasing (12), metal artifacts (13), scattering (14-16), motion artifacts (17-22), etc.). Therefore, this challenge aims to 1) evaluate various NR-IQA models on CT images containing complex noise/artifacts, 2) to compare their correlations with scores produced by radiologists, and 3) to grant insights into the determination of the best-performing metric of CT imaging in terms of correlating with the perception of radiologists’.

Furthermore, considering that low-dose CT images are achieved by reducing the number of projections per rotation and by reducing the X-ray current, the combination of two major artifacts, namely the sparse view streak and noise generated by these methods, is dealt with in this challenge so that the best-performing IQA model applicable in real clinical environments can be verified.

Funding Declaration:

This research was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.RS-2022-00155966, Artificial Intelligence Convergence Innovation Human Resources Development (Ewha Womans University)), and by the National Research Foundation of Korea (NRF-2022R1A2C1092072), and by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711174276, RS-2020-KD000016).

References:

Lee W, Cho E, Kim W, Choi J-H. Performance evaluation of image quality metrics for perceptual assessment of low-dose computed tomography images. Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment: SPIE, 2022.

Lee W, Cho E, Kim W, Choi H, Beck KS, Yoon HJ, Baek J, Choi J-H. No-reference perceptual CT image quality assessment based on a self-supervised learning framework. Machine Learning: Science and Technology 2022.

Choi D, Kim W, Lee J, Han M, Baek J, Choi J-H. Integration of 2D iteration and a 3D CNN-based model for multi-type artifact suppression in C-arm cone-beam CT. Machine Vision and Applications 2021;32(116):1-14.

Pal D, Patel B, Wang A. SSIQA: Multi-task learning for non-reference CT image quality assessment with self-supervised noise level prediction. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI): IEEE, 2021; p. 1962-1965.

Mittal A, Moorthy AK, Bovik AC. No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 2012;21(12):4695-4708. doi: 10.1109/TIP.2012.2214050

Lee J-YK, Wonjin; Lee, Yebin; Lee, Ji-Yeon; Ko, Eunji; Choi, Jang-Hwan. Unsupervised Domain Adaptation for Low-dose Computed Tomography Denoising. IEEE Access 2022.

Jeon S-Y, Kim W, Choi J-H. MM-Net: Multi-frame and Multi-mask-based Unsupervised Deep Denoising for Low-dose Computed Tomography. IEEE Transactions on Radiation and Plasma Medical Sciences 2022.

Kim W, Lee J, Kang M, Kim JS, Choi J-H. Wavelet subband-specific learning for low-dose computed tomography denoising. PloS one 2022;17(9):e0274308.

Han M, Shim H, Baek J. Low-dose CT denoising via convolutional neural network with an observer loss function. Med Phys 2021;48(10):5727-5742. doi: 10.1002/mp.15161

Kim B, Shim H, Baek J. Weakly-supervised progressive denoising with unpaired CT images. Med Image Anal 2021;71:102065. doi: 10.1016/j.media.2021.102065

Wagner F, Thies M, Gu M, Huang Y, Pechmann S, Patwari M, Ploner S, Aust O, Uderhardt S, Schett G, Christiansen S, Maier A. Ultralow-parameter denoising: Trainable bilateral filter layers in computed tomography. Med Phys 2022;49(8):5107-5120. doi: 10.1002/mp.15718

Kim B, Shim H, Baek J. A streak artifact reduction algorithm in sparse-view CT using a self-supervised neural representation. Med Phys 2022. doi: 10.1002/mp.15885

Kim S, Ahn J, Kim B, Kim C, Baek J. Convolutional neural network-based metal and streak artifacts reduction in dental CT images with sparse-view sampling scheme. Med Phys 2022;49(9):6253-6277. doi: 10.1002/mp.15884

Bier B, Berger M, Maier A, Kachelrieß M, Ritschl L, Müller K, Choi JH, Fahrig R. Scatter correction using a primary modulator on a clinical angiography Carm CT system. Med Phys 2017;44(9):e125-e137.

Maul N, Roser P, Birkhold A, Kowarschik M, Zhong X, Strobel N, Maier A. Learning-based occupational x-ray scatter estimation. Phys Med Biol 2022;67(7). doi: 10.1088/1361-6560/ac58dc

Roser P, Birkhold A, Preuhs A, Syben C, Felsner L, Hoppe E, Strobel N, Kowarschik M, Fahrig R, Maier A. X-Ray Scatter Estimation Using Deep Splines. IEEE Trans Med Imaging 2021;40(9):2272-2283. doi: 10.1109/TMI.2021.3074712

Maier J, Nitschke M, Choi JH, Gold G, Fahrig R, Eskofier BM, Maier A. Rigid and Non-Rigid Motion Compensation in Weight-Bearing CBCT of the Knee Using Simulated Inertial Measurements. IEEE Trans Biomed Eng 2022;69(5):1608-1619. doi: 10.1109/TBME.2021.3123673

Choi JH, Maier A, Keil A, Pal S, McWalter EJ, Beaupré GS, Gold GE, Fahrig R. Fiducial markerbased correction for involuntary motion in weightbearing Carm CT scanning of knees. II. Experiment. Med Phys 2014;41(6Part1):061902.

Choi JH, Fahrig R, Keil A, Besier TF, Pal S, McWalter EJ, Beaupré GS, Maier A. Fiducial markerbased correction for involuntary motion in weightbearing Carm CT scanning of knees. Part I. Numerical modelbased optimization. Med Phys 2013;40(9):091905.

Berger M, Muller K, Aichert A, Unberath M, Thies J, Choi JH, Fahrig R, Maier A. Marker-free motion correction in weight-bearing cone-beam CT of the knee joint. Med Phys 2016;43(3):1235-1248. doi: 10.1118/1.4941012

Ko Y, Moon S, Baek J, Shim H. Rigid and non-rigid motion artifact reduction in X-ray CT using attention module. Med Image Anal 2021;67:101883. doi: 10.1016/j.media.2020.101883

Preuhs A, Manhart M, Roser P, Hoppe E, Huang Y, Psychogios M, Kowarschik M, Maier A. Appearance Learning for Image-Based Motion Estimation in Tomography. IEEE Trans Med Imaging 2020;39(11):3667-3678. doi: 10.1109/TMI.2020.3002695
18trialmultiplicationIB - Datasets - IITA
data.iita.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.iita.org, 18trialmultiplicationIB - Datasets - IITA [Dataset]. https://data.iita.org/dataset/18trialmultiplicationib
Explore at:
Dataset provided by
International Institute of Tropical Agriculturehttp://www.iita.org/
Description
Have a back up for for the clones planted in the trails and this will serve as planting material (seed) for next season Citation APA Harvard MLA Vancouver Chicago IEEE CSE AMA NLM Turabian

Facebook

Twitter

Click to copy link

Link copied

Cite

Sepideh Neshatfar (2024). Cora [Dataset]. https://ieee-dataport.org/documents/cora

Cora

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Mar 11, 2024

Authors

Sepideh Neshatfar

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Clear search

Close search

Google apps

Main menu

Cora

Social network datasets and citation network datasets

IEEEPPG Dataset

Academic Visualisation Publications Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

WiFi RSS & RTT dataset with different LOS conditions for indoor positioning

EEG Dataset for ADHD

COTIDIANA Dataset

About

Citing

Data structure

Data from: Shape Completion with Prediction of Uncertain Regions

Reference datasets for in-flight emergency situations

A Replication Dataset for Fundamental Frequency Estimation

IMDb Movie Reviews Dataset

ASAYAR: A Dataset for Arabic-Latin Text Detection

ASAYAR

Overview

Annotation format

Dataset structure

Import data

Convert to text format

Examples of Annotated Images

Website

Citation

Data from: Reference Measurements of Error Vector Magnitude

CWRU bearing dataset and Gearbox dataset of IEEE PHM Challenge Competition...

Usk Coffe Segmentasi Dataset

TECNALIA WEEE (Waste from Electrical and Electronic Equipment) HYPERSPECTRAL...

Synthetic Dyslexia Handwriting Dataset (YOLO-Format)

Key Points of the Synthetic Generation Process

Historical References & Credits

References to Original Data Sources

Usage & Citation

Password Note (Original Data)

18SAHField - Datasets - IITA

Low-dose Computed Tomography Perceptual Image Quality Assessment Grand...

18trialmultiplicationIB - Datasets - IITA

CoraSee More Versions

Cora