We introduce the MetaMIDI Dataset (MMD) a large scale collection of 436,631 MIDI files and metadata. MMD contains artist and title metadata for 221,504 MIDI files, and genre metadata for 143,868 MIDI files, collected during the web-scraping process. MIDI files in MMD were matched against a collection of 32,000,000 30-second audio clips retrieved from Spotify, resulting in over 10,796,557 audio-MIDI matches. In addition, we linked 600,142 Spotify tracks with 1,094,901 MusicBrainz recordings to produce a set of 168,032 MIDI files that are matched to the MusicBrainz database. We also provide a set of 53,496 MIDI files using audio-MIDI matches where the derived metadata on Spotify is a fuzzy match to the web-scraped metadata. These links augment many files in the dataset with the extensive metadata available via the Spotify API and the MusicBrainz database. We anticipate that this collection of data will be of great use to MIR researchers addressing a variety of research topics.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Nintendo Entertainment System Music Database (NES-MDB) is a dataset intended for building automatic music composition systems for the NES audio synthesizer (paper). The NES synthesizer has highly constrained compositional parameters which are well-suited to a wide variety of current machine learning techniques. The synthesizer is typically programmed in assembly, but we parse the assembly into straightforward formats that are more suitable for machine learning.
The NES-MDB dataset consists of 5278 songs from the soundtracks of 397 NES games. The dataset represents 296 unique composers, and the songs contain more than two million notes combined. We build NES-MDB starting from the assembly code of NES games, which contain the exact timings and parameter values necessary for accurate chiptune renditions. We split the dataset into training, validation, and testing splits, ensuring that no composer appears in multiple splits.
The NES synthesizer has five instrument voices: two pulse-wave generators (P1
, P2
), a triangle-wave generator (TR
), a percussive noise generator (NO
), and an audio sample playback channel (excluded for simplicity). Each voice is programmed by modifying four 8-bit registers which update the audio synthesis state. With NES-MDB, the goal is to allow researchers to study NES music while shielding them from the inner workings of an archaic audio synthesis chip.
The MIDI file format stores discrete musical events that describe a composition. MIDI files in NES-MDB consist of note/velocity/timbre events with 44.1 kHz timing resolution, allowing for sample-accurate reconstruction by an NES synthesizer.
Each MIDI file consists of four instrument voices: P1
, P2
, TR
, and NO
. Each voice contains a timestamped list of MIDI note events. All voices except for TR
contain additional timestamped lists of MIDI control change events representing velocity (CC11
) and timbre (CC12
) information.
Click here for an IPython notebook exploring the MIDI version of NES-MDB
If you use this dataset or code in your research, cite this paper via the following BibTeX
@inproceedings{donahue2018nesmdb,
title={The NES Music Database: A multi-instrumental dataset with expressive performance attributes},
author={Donahue, Chris and Mao, Huanru Henry and McAuley, Julian},
booktitle={ISMIR},
year={2018}
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This platform is a multi-functional music data sharing platform for Computational Musicology research. It contains many music datas such as the sound information of Chinese traditional musical instruments and the labeling information of Chinese pop music, which is available for free use by computational musicology researchers.
This platform is also a large-scale music data sharing platform specially used for Computational Musicology research in China, including 3 music databases: Chinese Traditional Instrument Sound Database (CTIS), Midi-wav Bi-directional Database of Pop Music and Multi-functional Music Database for MIR Research (CCMusic). All 3 databases are available for free use by computational musicology researchers. For the contents contained in the database, we will provide audio files recorded by the professional team of the conservatory of music, as well as corresponding labelled files, which have no commodity copyright problem and facilitate large-scale promotion. We hope that this music data sharing platform can meet the one-stop data needs of users and contribute to the research in the field of Computational Musicology.
If you want to know more information or obtain complete files, please go to the official website of this platform:
Music Data Sharing Platform for Academic Research
Chinese Traditional Instrument Sound Database (CTIS)
This database is developed by Prof. Han Baoqiang's team for many years, which collects sound information about Chinese traditional musical instruments. The database includes 287 Chinese national musical instruments, including traditional musical instruments, improved musical instruments and ethnic minority musical instruments.
Multi-functional Music Database for MIR Research
This database collects sound materials of pop music, folk music and hundreds of national musical instruments, and makes comprehensive annotation to form a multi-purpose music database for MIR researchers.
Midi-wav Bi-directional Database of Pop Music
This database contains hundreds of Chinese pop songs, and each song contains the corresponding midi-audio-lyric information. Among them, recording the vocal part and accompaniment part of audio independently is helpful to study the MIR task under the ideal situation. In addition, the information of singing techniques consistent with vocal part (such as breath sound, falsetto, breathing, vibrato, mute, slide, etc.) is marked in MuseScore, which constitutes a Midi-Wav bi-direction corresponding pop music database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EMOPIA+ is a dataset published with paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation". It is an extension of EMOPIA, a multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music. To support two-stage generation, we extracted lead sheet (meloly + chord progressions) from original midi files, and manually corrected the key signatures of 367 clips. This dataset comprises 1,071 processed MIDI files along with their event-based representations: functional representation and REMI.
For more details about the extraction algorithms and the process to generate events, see the repo EMO-Disentanger.
File Description
midis/: Processed MIDI clips containing four tracks: Melody, Texture, Bass, and Chord. The first three tracks can be merged into the original MIDI files.
adjust_keyname.json: Manually adjusted key signatures.
functional/: Functional representations of the MIDI files.
performance/: Events for performance, including Melody, Texture and Bass tracks.
lead_sheet/: Events for lead sheet, including Melody and Chord tracks.
lead_sheet_to_performance/: Events for both lead sheet and performance, facilitating conditional music generation.
REMI/: REMI representations of the MIDI files, with the same structure as the functional/.
split/: train/val/test splits based on the instructions in EMOPIA.
Citation
@inproceedings{{EMOPIA}, author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan}, title = {{EMOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference, {ISMIR}}, year = {2021} }@inproceedings{emodisentanger2024, author = {Jingyue Huang and Ke Chen and Yi-Hsuan Yang}, title = {Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference, {ISMIR}}, year = {2024} }
MAPS – standing for MIDI Aligned Piano Sounds – is a database of MIDI-annotated piano recordings. MAPS has been designed in order to be released in the music information retrieval research community, especially for the development and the evaluation of algorithms for single-pitch or multipitch estimation and automatic transcription of music. It is composed by isolated notes, random-pitch chords, usual musical chords and pieces of music. The database provides a large amount of sounds obtained in various recording conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Automatic Music Transcription (ATM) is a well-known task in the Music Information Retrieval (MIR) domain and consists on the computation of a symbolic music representation from an audio recording. In this work, our focus is to adapt algorithms that extract musical information from an audio file for a particular instrument. The main objective is to study the automatic transcription of digitized music support systems. Currently, these techniques are applied to a generic sound timbre, to sounds to any instrument for further analysis and conversion to a digital music encoding and final score format. The results of this project add new knowledge in this automatic transcription field, since traverse flute has been selected as the instrument on which to focus all the process and, until now, there is no database of flute sounds for this purpose.
For so, we have recorded some sounds, both monophonic and polyphonic music. These audio files have been processed by the chosen transcription algorithm and converted to a digital music encoding format for its posterior alignment with the original recordings. Once all these data have been converted to text, the resulting labeled database its constituted by the initial audios and final aligned files.
Furthermore, after this process and from the obtained data, an evaluation of the transcriptor behavior has been made based on two main techniques: note and frame level.
This database includes the original audio files (.wav), transcribed MIDI files (.mid), aligned MIDI files (.mid), aligned text files (.txt) and evaluation files (.csv).
https://github.com/pencilresearch/midi/blob/main/LICENSEhttps://github.com/pencilresearch/midi/blob/main/LICENSE
The MIDI CC & NRPN open dataset aspires to be a comprehensive guide to the MIDI implementation of every synthesizer ever created.
EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators.
https://github.com/pencilresearch/midi/blob/main/LICENSEhttps://github.com/pencilresearch/midi/blob/main/LICENSE
The MIDI CC & NRPN open dataset aspires to be a comprehensive guide to the MIDI implementation of every synthesizer ever created.
https://staff.aist.go.jp/m.goto/RWC-MDB/https://staff.aist.go.jp/m.goto/RWC-MDB/
The RWC (Real World Computing) Music Database is a copyright-cleared music database (DB) that is available to researchers as a common foundation for research. It was built by the RWC Music Database Sub-Working Group of the Real World Computing Partnership (RWCP) of Japan. All the necessary copyrights and associated legal interests related to this database belong to Japan's National Institute of Advanced Industrial Science and Technology (AIST), who have provided for an unprecedented level of sharing of this musical data within the research community. The database will be distributed to researchers at a nominal cost to cover only duplication, shipping, and handling charges (i.e., it is practically free). The RWC Music Database is the world's first large-scale music database compiled specifically for research purposes. Shared databases are common in other fields of academic research and have frequently made significant contributions to progress in those areas. The field of music information processing, however, has lacked a common database of musical pieces or a large-scale corpus of musical instrument sounds. We therefore built the RWC Music Database which contains six original collections: the Popular Music Database (100 songs), Royalty-Free Music Database (15 songs), Classical Music Database (50 pieces), Jazz Music Database (50 pieces), Music Genre Database (100 pieces), and Musical Instrument Sound Database (50 instruments). For all 315 musical pieces performed and recorded for the database, we prepared original audio signals, corresponding standard MIDI files, and text files of lyrics (for songs). For the 50 instruments, we captured individual sounds at half-tone intervals with several variations of playing styles, dynamics, instrument manufacturers, and musicians. These collections will provide a benchmark that enables researchers to compare and evaluate their various systems and methods against a common standard. The database can also be used to stimulate research in corpus-oriented approaches that use statistical methods and learning techniques. In all cases, researchers can use the database for research publications and presentations without copyright restrictions. It is our hope that the RWC Music Database will make a significant contribution to future advances in the field of music information processing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce a novel multitrack dataset called EnsembleSet generated using the Spitfire BBC Symphony Orchestra library and ensemble scores from RWC Classical Music Database and Mutopia. Our data generation method introduces automated articulation mapping for different playing styles based on the input MIDI/MusicXML data. The sample library also enables us to render the dataset with 20 different mix/microphone configurations allowing us to study various recording scenarios for each performance. The dataset presents 80 tracks (6+ hours) with a range of string, wind, and brass instruments arranged as chamber ensembles. The paper associated with this dataset can be found here.
Click here for audio examples.
Contents:
80 Chamber ensemble pieces rendered from MIDI/MusicXML files using Spitfire BBC Symphony Orchestra Professional Library. Each folder contains 20 sub-folders containing 20 unique microphone/mix configurations:
Licensing:
The dataset utilizes 9 MIDI tracks from the RWC Classical Music Database (Track names: RM-Cxxx) for which the copyrights belong to the National Institute of Advanced Industrial Science and Technology and are managed by the RWC Music Database Administrator. Users may freely use this data for research purposes without facing the usual copyright restrictions as long as they fulfill the requirements as mentioned here. The remaining 71 MIDI/lilypond tracks are obtained from the Mutopia Project and are either public domain or protected by CCA-SA-3.0 license. Licensing information and other metadata related to the tracks can be found in this document.
NES-VMDB is a dataset containing 98,940 gameplay videos from 389 NES games, each paired with its original soundtrack in symbolic format (MIDI). NES-VMDB is built upon the Nintendo Entertainment System Music Database (NES-MDB), encompassing 5,278 music pieces from 397 NES games.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
JAZ-DRM Dataset
JAZ-DRM Dataset is an open audio collection of drum recordings in the style of classic and modern jazz music. It features 1,675 audio loops provided in uncompressed stereo WAV format, along with paired JSON files containing label data for supervised training of generative AI audio models.
Overview
The dataset was developed using an algorithmic framework to randomly generate audio loops from a customized database of MIDI patterns and multi-velocity drum kit samples. It is intended for training or fine-tuning AI models to learn high-performance drum notations using paired labels adaptable for prompt-driven generation and other supervised learning tasks.
Its primary purpose is to provide accessible content for machine learning applications in music. Potential use cases include text-to-audio, prompt engineering, feature extraction, tempo detection, audio classification, rhythm analysis, music information retrieval (MIR), sound design and signal processing.
SpecificationsA key map JSON file is provided for referencing and converting MIDI note numbers to text labels. You can update the text labels to suit your preferences.
License
This dataset was compiled by WaivOps, a crowdsourced music project managed by Patchbanks. All recordings have been sourced from verified composers and providers for copyright clearance.
The JAZ-DRM Dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Additional Info
For audio examples or more information about this dataset, please refer to the GitHub repository.
https://github.com/pencilresearch/midi/blob/main/LICENSEhttps://github.com/pencilresearch/midi/blob/main/LICENSE
The MIDI CC & NRPN open dataset aspires to be a comprehensive guide to the MIDI implementation of every synthesizer ever created.
The Nintendo Entertainment System Music Database (NES-MDB) is a dataset intended for building automatic music composition systems for the NES audio synthesizer. It consists of 5278 songs from the soundtracks of 397 NES games. The dataset represents 296 unique composers, and the songs contain more than two million notes combined. It has file format options for MIDI, score and NLM (NES Language Modeling).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HH-TRP Dataset
HH-TRP Dataset is an open audio collection of drum recordings in the style of modern hip hop (urban trap) music. It features 15,000 audio loops provided in uncompressed stereo WAV format, along with paired JSON files containing label data for supervised training of generative AI audio models.
Overview
The dataset was developed using an algorithmic framework to randomly generate audio loops from a customized database of MIDI patterns and one-shot drum samples. Data augmentation included random sample-swapping to generate unique drum kits and sound effects. It is intended for training or fine-tuning AI models with paired labels, adaptable for prompt-driven drum generation and other supervised learning objectives.
Its primary purpose is to provide accessible content for machine learning applications in music. Potential use cases include text-to-audio, prompt engineering, feature extraction, tempo detection, audio classification, rhythm analysis, music information retrieval (MIR), sound design and signal processing.
Specifications
A key map JSON file is provided for referencing and converting MIDI note numbers to text labels. You can update the text labels to suit your preferences.
License
This dataset was compiled by WaivOps, a crowdsourced music project managed by Patchbanks. All recordings have been sourced from verified composers and providers for copyright clearance.
The HH-TRP Dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Additional Info
Time signature data has been added to the standard JSON file format.
For audio examples or more information about this dataset, please refer to the GitHub repository.
https://github.com/pencilresearch/midi/blob/main/LICENSEhttps://github.com/pencilresearch/midi/blob/main/LICENSE
The MIDI CC & NRPN open dataset aspires to be a comprehensive guide to the MIDI implementation of every synthesizer ever created.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All MAGs obtained via different metagenomic and midi-metagenomic approaches including redundant MAGs obtained by multiple co-assembly-groups and MAGs of low quality
Only dereplicated MAGS of moderate quality or above and associated assemblies were uploaded to NCBI to prevent increase of redundancy and contamintion in public databases, as the lower quality redundant MAGs and assemblies have no additional research value other than reproduceability. Therefore this dataset is provided to enable full reproduceability of our results.
Supplemental data to the revised submission of the publication “Midi-metagenomics: A novel approach for cultivation independent microbial genome reconstruction from environmental samples" by Vollmers et al.
https://github.com/pencilresearch/midi/blob/main/LICENSEhttps://github.com/pencilresearch/midi/blob/main/LICENSE
The MIDI CC & NRPN open dataset aspires to be a comprehensive guide to the MIDI implementation of every synthesizer ever created.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PHENICX-Anechoic: denoised recordings and note annotations for Aalto anechoic orchestral database
Description
This dataset includes audio and annotations useful for tasks as score-informed source separation, score following, multi-pitch estimation, transcription or instrument detection, in the context of symphonic music.
This dataset was presented and used in the evaluation of:
M. Miron, J. Carabias-Orti, J. J. Bosch, E. Gómez and J. Janer, "Score-informed source separation for multi-channel orchestral recordings", Journal of Electrical and Computer Engineering (2016))"
On this web page we do not provide the original audio files, which can be found at the web page hosted by Aalto University. However, with their permission we distribute the denoised versions for some of the anechoic orchestral recordings:
Pätynen, J., Pulkki, V., and Lokki, T., "Anechoic recording system for symphony orchestra," Acta Acustica united with Acustica, vol. 94, nr. 6, pp. 856-865, November/December 2008.
For the intellectual rights and the distribution policy of the audio recordings in this dataset contact Aalto University, Jukka Pätynen and Tapio Lokki. For more information about the original anechoic recordings we refer to the web page and the associated publication [2]
We provide the associated musical note onset and offset annotations, and the Roomsim[3] configuration files used to generate the multi-microphone recordings [1].
The anechoic dataset in [2] consists of four passages of symphonic music from the Classical and Romantic periods. This work presented a set of anechoic recordings for each of the instruments, which were then synchronized between them so that they could later be combined to a mix of the orchestra. In order to keep the evaluation setup consistent between the four pieces, we selected the following instruments: violin, viola, cello, double bass, oboe, flute, clarinet, horn, trumpet and bassoon.
We created a ground truth score, by hand annotating the notes played by the instruments. The annotation process involved gathering the original scores in MIDI format, performing an initial automatic audio-to-score alignment, then manually aligning each instrument track separately with the guidance of a monophonic pitch estimation.
During the recording process detailed in [2], the gain of the microphone amplifiers was fixed to the same value for the whole process, which reduced the dynamic range of the recordings of the quieter instruments. This lead to problems with which we had to deal, in order to reduce the noise. In the paper we described the score-informed denoising procedure we applied to each track.
A complete description of the dataset and the creation methodology, including the generation of the multi-microphone recordings, is presented in [1].
Files included
The “audio” folder contains the audio files for each instrument in a given source: sourcenumber.wav, where “source” can be either violin, viola, cello, double bass, oboe, flute, clarinet, horn, trumpet or bassoon and “number” corresponds to the each separated instrument in a given source (e.g. there are two violins in the “mozart” piece, thus you will find “violin1.wav” and “violin2.wav” in the “mozart” folder).
The “annotations” folder includes note onsets and offset annotations as MIDI and text files for the corresponding audio files in the dataset. The annotations are offered per source: source.txt and source.mid, where “source” can be either violin, viola, cello, double bass, oboe, flute, clarinet, horn, trumpet or bassoon. Additionally, for tasks as score-following, we provide MIDI which is not aligned with the audio as MIDI and text file: source_o.txt and source_o.mid. Furthermore, an additional MIDI file all.mid holds the tracks for all the sources in a single MIDI file.
The text files comprise all the notes played by a source in the following format:
Onset,Offset,Note name
We recommend using the ground truth annotations from the text file as the MIDI might have problems due to the incorrect duration for some notes.
The “Roomsim” folder contains the configuration files (“Text_setups”) and the impulse responses (“IRs”) which can be used with Roomsim[2] to generate the corresponding room configuration and the multi-microphone audio tracks used in our research.
In the “Text_setups” folder, one can find the Roomsim text setups for the microphones: C,HRN,L,R,V1,V2,VL,WW_L,WW_R,TR.
The “IRs” folder contains two subfolders: “conf1” can be used to generate the recordings for the Mozart piece, and “conf2” for the Bruckner, Beethoven, and Mahler pieces. We provide IR “.mat” files for each of the pairs (“microphone”,”source”): microphone_Ssourcenumber.mat, where “microphone” is C,HRN,L,R,V1,V2,VL,WW_L,WW_R,TR, and “sourcenumber” is the number of the sources ordered as in this list: bassoon (1), cello(2), clarinet(3), double bass(4), flute(5), horn(6), viola(7), violin(8), oboe(9), trumpet(10). Please consider that the Mozart piece does not contain oboe nor trumpet.
Conditions of Use
The annotations and the Roomsim configuration files in the PHENICX-Anechoic dataset are offered free of charge for non-commercial use only. You can not redistribute them nor modify them. Dataset by Marius Miron, Julio Carabias-Orti, Juan Jose Bosch, Emilia Gómez and Jordi Janer, Music Technology Group - Universitat Pompeu Fabra (Barcelona). This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
For the intellectual rights and the distribution policy of the audio recordings in this dataset contact Aalto University, Jukka Pätynen and Tapio Lokki. For more information about the original anechoic recordings we refer to the web page and the associated publication [2].
Please Acknowledge PHENICX-Anechoic in Academic Research
When the present dataset is used for academic research, we would highly appreciate if scientific publications of works partly based on the PHENICX-Anechoic dataset quote the following publications:
M. Miron, J. Carabias-Orti, J. J. Bosch, E. Gómez and J. Janer, "Score-informed source separation for multi-channel orchestral recordings", Journal of Electrical and Computer Engineering (2016)
Pätynen, J., Pulkki, V., and Lokki, T., "Anechoic recording system for symphony orchestra," Acta Acustica united with Acustica, vol. 94, nr. 6, pp. 856-865, November/December 2008.
Download
Dataset available
Go to our download page.
Feedback
Problems, positive feedback, negative feedback, help to improve the annotations... it is all welcome! Send your feedback to: marius.miron@upf.edu AND mtg@upf.edu
In case of a problem report please include as many details as possible.
References
[1] M. Miron, J. Carabias-Orti, J. J. Bosch, E. Gómez and J. Janer, "Score-informed source separation for multi-channel orchestral recordings", Journal of Electrical and Computer Engineering (2016)
[2] Pätynen, J., Pulkki, V., and Lokki, T., "Anechoic recording system for symphony orchestra," Acta Acustica united with Acustica, vol. 94, nr. 6, pp. 856-865, November/December 2008.
[2] Campbell, D., K. Palomaki, and G. Brown. "A Matlab simulation of" shoebox" room acoustics for use in research and teaching." Computing and Information Systems 9.3 (2005): 48.
We introduce the MetaMIDI Dataset (MMD) a large scale collection of 436,631 MIDI files and metadata. MMD contains artist and title metadata for 221,504 MIDI files, and genre metadata for 143,868 MIDI files, collected during the web-scraping process. MIDI files in MMD were matched against a collection of 32,000,000 30-second audio clips retrieved from Spotify, resulting in over 10,796,557 audio-MIDI matches. In addition, we linked 600,142 Spotify tracks with 1,094,901 MusicBrainz recordings to produce a set of 168,032 MIDI files that are matched to the MusicBrainz database. We also provide a set of 53,496 MIDI files using audio-MIDI matches where the derived metadata on Spotify is a fuzzy match to the web-scraped metadata. These links augment many files in the dataset with the extensive metadata available via the Spotify API and the MusicBrainz database. We anticipate that this collection of data will be of great use to MIR researchers addressing a variety of research topics.