Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
link to original dataset: https://bird-bench.github.io/ Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N. and Zhou, X., 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
BIRD-CRITIC-1.0-Flash
BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in real-world database applications? Each task in BIRD-CRITIC has been verified by human experts on the following dimensions:
Reproduction of errors on BIRD env to prevent data leakage. Carefully curate test case functions for each task specifically. Soft EX: This metric can evaluate SELECT-ONLY tasks. Soft EX + Parsing:… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-flash-exp.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The growth of biodiversity data sets generated by citizen scientists continues to accelerate. The availability of such data has greatly expanded the scale of questions researchers can address. Yet, error, bias, and noise continue to be serious concerns for analysts, particularly when data being contributed to these giant online data sets are difficult to verify. Counts of birds contributed to eBird, the world’s largest biodiversity online database, present a potentially useful resource for tracking trends over time and space in species’ abundances. We quantified counting accuracy in a sample of 1,406 eBird checklists by comparing numbers contributed by birders (N = 246) who visited a popular birding location in Oregon, USA, with numbers generated by a professional ornithologist engaged in a long-term study creating benchmark (reference) measurements of daily bird counts. We focused on waterbirds, which are easily visible at this site. We evaluated potential predictors of count differences, including characteristics of contributed checklists, of each species, and of time of day and year. Count differences were biased toward undercounts, with more than 75% of counts being below the daily benchmark value. Median count discrepancies were −29.1% (range: 0 to −42.8%; N = 20 species). Model sets revealed an important influence of each species’ reference count, which varied seasonally as waterbird numbers fluctuated, and of percent of species known to be present each day that were included on each checklist. That is, checklists indicating a more thorough survey of the species richness at the site also had, on average, smaller count differences. However, even on checklists with the most thorough species lists, counts were biased low and exceptionally variable in their accuracy. To improve utility of such bird count data, we suggest three strategies to pursue in the future. (1) Assess additional options for analytically determining how to select checklists that include less biased count data, as well as exploring options for correcting bias during the analysis stage. (2) Add options for users to provide additional information that helps analysts choose checklists, such as an option for users to tag checklists where they focused on obtaining accurate counts. (3) Explore opportunities to effectively calibrate citizen-science bird count data by establishing a formalized network of marquis sites where dedicated observers regularly contribute carefully collected benchmark data.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Update 2025-06-08
We release the preview version of BIRD-Critic-PG, a dataset containing 530 high-quality user issues focused on real-world PostgreSQL database applications. The schema file is include in the code repository https://github.com/bird-bench/BIRD-CRITIC-1/blob/main/baseline/data/post_schema.jsonl
BIRD-CRITIC-1.0-PG
BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question: Can large language models (LLMs) fix user issues in… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/bird-critic-1.0-postgresql.
Ecological restoration has emerged as a key strategy for conserving tropical forests and habitat specialists, and monitoring faunal recovery using indicator taxa like birds can help assess restoration success. Few studies have examined, however, whether active restoration achieves better recovery of bird communities than natural regeneration, or how bird recovery relates to habitat affiliations of species in the community. In rainforests restored over the past two decades in a fragmented landscape (Western Ghats, India), we examined whether bird species richness and community composition recovery in 23 actively restored (AR) sites was significantly better than recovery in paired naturally regenerating (NR) sites, relative to 23 undisturbed benchmark (BM) rainforests. We measured 8 habitat variables and tested whether bird recovery tracked habitat recovery, whether rainforest and open-country birds showed contrasting patterns, and assessed species-level responses to restoration. W...
BirdVox-70k
=========
Version 1.0, October 2017.
Created By
----------
Vincent Lostanlen (1, 2, 3), Justin Salamon (2, 3), Andrew Farnsworth (1), Steve Kelling (1), and Juan Pablo Bello (2, 3).
(1): Cornell Lab of Ornithology (CLO)
(2): Center for Urban Science and Progress, New York University
(3): Music and Audio Research Lab, New York University
Description
-----------
The BirdVox-70k dataset contains 6 audio recordings, each about ten hours in duration. These recordings come from ROBIN autonomous recording units, placed near Ithaca, NY, USA during the fall 2015. They were captured on the night of September 23rd, 2015, by six different sensors, originally numbered 1, 2, 3, 5, 7, and 10.
Andrew Farnsworth used the Raven software to pinpoint every avian flight call in time and frequency. He found 35402 flight calls in total. He estimates that about 25 different species of passerines (thrushes, warblers, and sparrows) are present in this recording. Species are not labeled in BirdVox-70k, but it is possible to tell apart thrushes from warblers and sparrrows by looking at the center frequencies of their calls. The annotation process took 102 hours.
The dataset can be used, among other things, for the research,
development and testing of bioacoustic classification models, including the reproduction of the results reported in [1].
For details on the hardware of ROBIN recording units, we refer the reader to [2].
[1] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello. BirdVox-70k: a dataset and benchmark for avian flight call detection, submitted, 2018.
[2] J. Salamon, J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck, and S. Kelling. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS One, 2016.
Data Files
----------
The BirdVox-70k_full-night-audio folder contains the recordings as FLAC files, sampled at 24 kHz, with a single channel (mono).
Metadata Files
--------------
The BirdVox-70k_annotations folder contains CSV files, where each row correspond to a different location in the time frequency domain (columns "Center Time (s)" and "Center Freq (Hz)").
/!\ CAUTION: in addition to the 35402 flight calls, Andrew Farnsworth pinpointed 29 artificial beeps produced by the recording device itself. These beeps are labeled as "alarm" instead of "flight call". For collecting positive samples for avian flight call detection, make sure you filter out the rows corresponding to alarms.
The approximate GPS coordinates of the sensors (latitudes and longitudes rounded to 2 decimal points) and UTC timestamps corresponding to the start of the recording for each sensor are included as CSV files in the main directory.
Please Acknowledge BirdVox-70k in Academic Research
------------------------------------------------------
When BirdVox-70k is used for academic research, we would highly appreciate it if scientific publications of works partly based on this dataset cite the following publication:
V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello, “BirdVox-70k: a dataset and benchmark for avian flight call detection”, submitted.
The creation of this dataset was supported by NSF grants 1125098 (BIRDCAST) and 1633259 (BIRDVOX), a Google Faculty Award, the Leon Levy Foundation, and two anonymous donors.
Conditions of Use
-----------------
Dataset created by Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, and Juan Pablo Bello.
The BirdVox-70k dataset is offered free of charge under the terms of the Creative Commons CC0 1.0 Universal License:
https://creativecommons.org/publicdomain/zero/1.0/
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, CLO is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the BirdVox-70k dataset or any part of it.
Feedback
--------
Please help us improve BirdVox-70k by sending your feedback to:
vincent.lostanlen@gmail.com and af27@cornell.edu
In case of a problem, please include as many details as possible.
Acknowledgements
-------------------
Jessie Barry, Ian Davies, Tom Fredericks, Jeff Gerbracht, Sara Keen, Holger Klinck, Anne Klingensmith, Ray Mack, Peter Marchetto, Ed Moore, Matt Robbins, Ken Rosenberg, and Chris Tessaglia-Hymes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BirdVox-70k: a dataset for avian flight call detection in half-second clips
======================================================================================
Version 1.0, April 2018.
Created By
----------
Vincent Lostanlen (1, 2, 3), Justin Salamon (2, 3), Andrew Farnsworth (1), Steve Kelling (1), and Juan Pablo Bello (2, 3).
(1): Cornell Lab of Ornithology (CLO)
(2): Center for Urban Science and Progress, New York University
(3): Music and Audio Research Lab, New York University
Description
-----------
The BirdVox-70k dataset contains 70k half-second clips from 6 audio recordings in the BirdVox-full-night dataset, each about ten hours in duration. These recordings come from ROBIN autonomous recording units, placed near Ithaca, NY, USA during the fall 2015. They were captured on the night of September 23rd, 2015, by six different sensors, originally numbered 1, 2, 3, 5, 7, and 10.
Andrew Farnsworth used the Raven software to pinpoint every avian flight call in time and frequency. He found 35402 flight calls in total. He estimates that about 25 different species of passerines (thrushes, warblers, and sparrows) are present in this recording. Species are not labeled in BirdVox-70k, but it is possible to tell apart thrushes from warblers and sparrrows by looking at the center frequencies of their calls. The annotation process took 102 hours.
The dataset can be used, among other things, for the research,development and testing of bioacoustic classification mode ls, including the reproduction of the results reported in [1].
For details on the hardware of ROBIN recording units, we refer the reader to [2].
[1] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection. Proc. IEEE ICASSP, 2018.
[2] J. Salamon, J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck, and S. Kelling. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS One, 2016.
@inproceedings{lostanlen2018icassp,
title = {BirdVox-full-night: a dataset and benchmark for avian flight call detection},
author = {Lostanlen, Vincent and Salamon, Justin and Farnsworth, Andrew and Kelling, Steve and Bello, Juan Pablo},
booktitle = {Proc. IEEE ICASSP},
year = {2018},
published = {IEEE},
venue = {Calgary, Canada},
month = {April},
}
Data Files
------------
BirdVox-70k contains the recordings as HDF5 files, sampled at 24 kHz, with a single channel (mono). Each HDF5 file corresponds to a different sensor. The name of the HDF5 dataset in each file is "waveforms".
Metadata Files
--------------
Contrary to BirdVox-full-night, BirdVox-70k is not shipped with a metadata file. Rather, the metadata is included in the keys of the elements in the HDF5 files themselves, whose values are the waveforms.
An example of BirdVox-70k key is:
unitID_TIMESTAMP_FREQ_LABEL
where
Example:
unit01_085256784_03636_1
is a positive clip in unit 01, with timestamp 085256784 (3552.37 seconds after dividing by the sample rate 24000), center frequency 3636 Hz.
Another example:
unit05_284775340_00000_0
is a negative clip in unit 05, with timestamp 284775340 (11865.64 seconds).
The approximate GPS coordinates of the sensors (latitudes and longitudes rounded to 2 decimal points) and UTC timestamps corresponding to the start of the recording for each sensor are included as CSV files in the main directory.
Please acknowledge BirdVox-70k in academic research
----------------------------------------------------------
When BirdVox-70k is used for academic research, we would highly appreciate it if scientific publications of works partly based on this dataset cite the following publication:
V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
The creation of this dataset was supported by NSF grants 1125098 (BIRDCAST) and 1633259 (BIRDVOX), a Google Faculty Award, the Leon Levy Foundation, and two anonymous donors.
Conditions of Use
-----------------
Dataset created by Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, and Juan Pablo Bello.
The BirdVox-70k dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license:
https://creativecommons.org/licenses/by/4.0/
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, Cornell Lab of Ornithology is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the BirdVox-70k dataset or any part of it.
Feedback
-----------
Please help us improve BirdVox-70k by sending your feedback to:
vincent.lostanlen@gmail.com and af27@cornell.edu
In case of a problem, please include as many details as possible.
Acknowledgements
----------------
Jessie Barry, Ian Davies, Tom Fredericks, Jeff Gerbracht, Sara Keen, Holger Klinck, Anne Klingensmith, Ray Mack, Peter Marchetto, Ed Moore, Matt Robbins, Ken Rosenberg, and Chris Tessaglia-Hymes.
We acknowledge that the land on which the data was collected is the unceded territory of the Cayuga nation, which is part of the Haudenosaunee (Iroquois) confederacy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Extensive networks of large plots have the potential to transform knowledge of avian community dynamics through time and across geographical space. In the Neotropics, the global hotspot of avian diversity, only six 100-ha plots, all located in lowland forests of Amazonia, the Guianan shield and Panama, have been inventoried sufficiently. We review the most important lessons learned about Neotropical forest bird communities from those big bird plots and explore opportunities for creating a more extensive network of additional plots to address questions in ecology and conservation, following the model of the existing ForestGEO network of tree plots. Scholarly impact of the big bird plot papers has been extensive, with the papers accumulating nearly 1,500 citations, particularly on topics of tropical ecology, avian conservation, and community organization. Comparisons of results from the plot surveys show no single methodological scheme works effectively for surveying abundances of all bird species at all sites; multiple approaches have been utilized and must be employed in the future. On the existing plots, abundance patterns varied substantially between the South American plots and the Central American one, suggesting different community structuring mechanisms are at work and that additional sampling across geographic space is needed. Total bird abundance in Panama, dominated by small insectivores, was double that of Amazonia and the Guianan plateau, which were dominated by large granivores and frugivores. The most common species in Panama were three times more abundant than those in Amazonia, whereas overall richness was 1.5 times greater in Amazonia. Despite these differences in community structure, other basic information, including uncertainty in population density estimates, has yet to be quantified. Results from existing plots may inform drivers of differences in community structure and create baselines for detection of long-term regional changes in bird abundances, but supplementation of the small number of plots is needed to increase generalizability of results and reveal the texture of geographic variation. We propose fruitful avenues of future research based on our current synthesis of the big bird plots. Collaborating with the large network of ForestGEO tree plots could be one approach to improve understanding of linkages between plant and bird diversity. Careful quantification of bird survey effort, recording of exact locations of survey routes or stations, and archiving detailed metadata will greatly enhance the value of benchmark data for future repeat surveys of the existing plots and initial surveys of newly established plots.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Unified BirdCLEF'25 Dataset brings together multiple years of BirdCLEF competition data into a single, structured resource. This dataset is designed to help researchers and machine learning practitioners develop and benchmark models for passive acoustic monitoring (PAM) and bioacoustic species classification.
Key Features 📂 Multi-Year Compilation: Aggregates BirdCLEF datasets from 2020 to 2025, ensuring a comprehensive and diverse collection of bird sound recordings. 🎙 Diverse Environments: Captures bird calls from various geographic regions, including dense forests, open landscapes, and urban areas. 🏆 Competition-Grade Labels: Includes expertly annotated species labels, as used in previous BirdCLEF competitions.
Potential Applications 🔍 Bird Species Identification – Train models to recognize bird calls in noisy environments. 📡 Bioacoustic Monitoring – Develop automated solutions for biodiversity tracking. 🧠 Self-Supervised Learning – Utilize large amounts of unlabeled data for representation learning. 🌎 Climate & Conservation Research – Analyze bird population trends to support ecological studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version 2.0, February 2022.
BirdVox-ANAFCC is a dataset of short audio waveforms, each of them containing a flight call from one of 14 birds of North America: four American sparrows, one cardinal, two thrushes, and seven New World warblers. * American Tree Sparrow (ATSP) * Chipping Sparrow (CHSP) * Savannah Sparrow (SAVS) * White-throated Sparrow (WTSP) * Red-breasted Grosbeak (RBGR) * Gray-cheeked Thrush (GCTH) * Swainson's Thrush (SWTH) * American Redstart (AMRE) * Bay-breasted Warbler (BBWA) * Black-throated Blue Warbler (BTBW) * Canada Warbler (CAWA) * Common Yellowthroat (COYE) * Mourning Warbler (MOWA) * Ovenbird (OVEN)
It also contains other sounds which are often confused for one of the species above. These "confounding factors" encompass flight calls from other species of birds, vocalizations from non-avian animals, as well as some machine beeps.
BirdVox-ANAFCC results from an aggregation of various smaller datasets, integrated under a common taxonomy. For more details on this taxonomy, we refer the reader to [1]:
[1] Cramer, Lostanlen, Salamon, Farnsworth, Bello. Chirping up the right tree: Incorporating biological taxonomies into deep bioacoustic classifiers. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
The second version of the BirdVox-ANAFCC dataset (v2.0) contains flight calls from the BirdVox-full-night dataset. These flight calls were present in the ICASSP 2020 benchmark but did not appear in the initial release of BirdVox-ANAFCC.
BirdVox-ANAFCC contains the recordings as HDF5 files, sampled at 22,050 Hz, with a single channel (mono). Each HDF5 file contains flight call vocalizations of a particular species. The name of each HDF5 file follows the format: _original.h5
. The name of the HDF5 dataset in each file is "waveforms", with the corresponding key for each audio recording varying in format depending on the data source.
taxonomy.yaml
details the three-level taxonomy structure used in this dataset, reflected in three-number-codes which largely follow "..". Additionally, at any level of the taxonomy, the numeric code "0" is reserved for "other" and the code "X" refers to unknown. For example, 1.1.0 corresponds to an American Sparrow with a species outside of our scope of interest, and 1.1.X corresponds to an American Sparrow of unknown species. At the top level (family), the "other" codes (0.*.*) deviate from the family-order-species in order to capture a variety of other out-of-scope sounds, including anthropophony, non-avian biophony, and biophony of avians outside of the scope of interest.
When BirdVox-ANAFCC is used for academic research, we would highly appreciate it if scientific publications of works partly based on this dataset cite the following publication:
Cramer, Lostanlen, Salamon, Farnsworth, Bello. Chirping up the right tree: Incorporating biological taxonomies into deep bioacoustic classifiers. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
The creation of this dataset was supported by NSF grants 1125098 (BIRDCAST) and 1633259 (BIRDVOX), a Google Faculty Award, the Leon Levy Foundation, and two anonymous donors.
Dataset created by Aurora Cramer, Vincent Lostanlen, Bill Evans, Andrew Farnsworth, Justin Salamon, and Juan Pablo Bello.
The BirdVox-ANAFCC dataset is offered free of charge under the terms of the Creative Commons Attribution International License: https://creativecommons.org/licenses/by/4.0/
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the authors are not liable for, and expressly exclude all liability for, loss or damage however and whenever caused to anyone by any use of the BirdVox-ANAFCC dataset or any part of it.
Please help us improve BirdVox-full-night by sending your feedback to: vincent.lostanlen@gmail.com and auroracramer@nyu.edu
In case of a problem, please include as many details as possible.
1.0, May 2020: initial version, paired with ICASSP 2020 publication. 2.0, February 2022: added a missing dataset file (BirdVox-70k), updated name of first author (Aurora Cramer).
Jessie Barry, Ian Davies, Tom Fredericks, Jeff Gerbracht, Sara Keen, Holger Klinck, Anne Klingensmith, Ray Mack, Peter Marchetto, Ed Moore, Matt Robbins, Ken Rosenberg, and Chris Tessaglia-Hymes.
We thank contributors and maintainers of the Macaulay Library and the Xeno-Canto website.
We acknowledge that the land on which the data was collected is the unceded territory of the Cayuga nation, which is part of the Haudenosaunee (Iroquois) confederacy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BirdVox-scaper-10k: a synthetic dataset for multilabel species classification of flight calls from 10-second audio recordings
=============================================================================================
Version 1.0, September 2019.
Created By
-------------
Elizabeth Mendoza (1), Vincent Lostanlen (2, 3, 4), Justin Salamon (3, 4), Andrew Farnsworth (2), Steve Kelling (2), and Juan Pablo Bello (3, 4).
(1): Forest Hills High School, New York, NY, USA
(2): Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
(3): Center for Urban Science and Progress, New York University, New York, NY, USA
(4): Music and Audio Research Lab, New York University, New York, NY, USA
Description
--------------
The BirdVox-scaper-10k dataset contains 9983 artificial soundscapes. Each soundscape lasts exactly ten seconds and contains one or several avian flight calls from up to 30 different species of New World warblers (Parulidae). Alongside each audio file, we include an annotation file describing the start time and end time of each flight call in the corresponding soundscape, as well as the species of warbler it belongs to.
In order to synthesize soundscapes in BirdVox-scaper-10k, we mixed natural sounds from various pre-recorded sources. First, we extracted isolated recordings of flight calls containing little or no background noise from the CLO-43SD dataset [1]. Secondly, we extracted 10-second "empty" acoustic scenes from the BirdVox-DCASE-20k dataset [2]. These acoustic scenes contain various sources of real-world background noise, including biophony (insects) and anthropophony (vehicles), yet are guaranteed to be devoid of any flight calls. Lastly, we "fill" each acoustic scene by mixing it with flight calls sampled at random.
Although the BirdVox-scaper-10k does not consist of natural recordings, we have taken several measures to ensure the plausibility of each synthesized soundscape, both from qualitative and quantitative standpoints.
The BirdVox-scaper-10k dataset can be used, among other things, for the research, development, and testing of bioacoustic classification models.
For details on the hardware of ROBIN recording units, we refer the reader to [2].
[1] J. Salamon, J. Bello. Fusing shallow and deep learning for bioacoustic bird species classification. Proc. IEEE ICASSP, 2017.
[2] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, and J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection. Proc. IEEE ICASSP, 2018.
[3] J. Salamon, J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck, and S. Kelling. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS One, 2016.
@inproceedings{lostanlen2018icassp,
title = {BirdVox-full-night: a dataset and benchmark for avian flight call detection},
author = {Lostanlen, Vincent and Salamon, Justin and Farnsworth, Andrew and Kelling, Steve and Bello, Juan Pablo},
booktitle = {Proc. IEEE ICASSP},
year = {2018},
published = {IEEE},
venue = {Calgary, Canada},
month = {April},
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was originally created by Jordan Bird, Leah Bird, Carrie Ijichi, Aurelie Jolivald, Salisu Wada, Kay Owa, Chloe Barnes of Nottingham Trent University (United Kingdom).
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🚀 LiveSQLBench-Base-Lite
A dynamic, contamination‑free benchmark for evaluating LLMs on complex, real‑world text‑to‑SQL tasks. 🌐 Website • 📄 Paper (coming soon) • 💻 GitHub Maintained by the 🦜 BIRD Team @ HKU & ☁️ Google Cloud
📊 LiveSQLBench Overview
LiveSQLBench (BIRD-SQL Pro v0.5) is a contamination-free, continuously evolving benchmark designed to evaluate LLMs on complex, real-world text-to-SQL tasks, featuring diverse real-world user queries, including… See the full description on the dataset page: https://huggingface.co/datasets/birdsql/livesqlbench-base-lite.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Biometrics form a key characteristic of a species. Here, we provide a summary of biometrics held by the South African Bird Ringing Scheme (SAFRING), which was initiated in 1948, including measures of mass and lengths of the tarsus, head, culmen, tail and wing. We include all species in southern Africa for which there was sufficient data. Accordingly, we present biometric data for 674 of the 904 southern African bird species. We also investigated whether there were sex-specific differences for each species, and provide summaries for species where values significantly differed between the sexes. We found 376 species with significant sex-specific differences for at least one measure (e.g. mass). Although SAFRING holds data entries for many ringed individuals, a sizeable proportion of the entries was not useable as biometric data. Therefore, in this article, we aim to: 1) present a complete, standardised reference of summarised biometric data for the birds of southern Africa; 2) provide ringers with benchmark values that could guide data-capturing; 3) identify data-deficient species; and 4) highlight the importance of collecting and capturing biometric data carefully and consistently.
BirdVox-full-night
=============
Version 1.0, February 2018.
Created By
----------
Vincent Lostanlen (1, 2, 3), Justin Salamon (2, 3), Andrew Farnsworth (1), Steve Kelling (1), and Juan Pablo Bello (2, 3).
(1): Cornell Lab of Ornithology (CLO)
(2): Center for Urban Science and Progress, New York University
(3): Music and Audio Research Lab, New York University
Description
-----------
The BirdVox-full-night dataset contains 6 audio recordings, each about ten hours in duration. These recordings come from ROBIN autonomous recording units, placed near Ithaca, NY, USA during the fall 2015. They were captured on the night of September 23rd, 2015, by six different sensors, originally numbered 1, 2, 3, 5, 7, and 10.
Andrew Farnsworth used the Raven software to pinpoint every avian flight call in time and frequency. He found 35402 flight calls in total. He estimates that about 25 different species of passerines (thrushes, warblers, and sparrows) are present in this recording. Species are not labeled in BirdVox-full-night, but it is possible to tell apart thrushes from warblers and sparrrows by looking at the center frequencies of their calls. The annotation process took 102 hours.
The dataset can be used, among other things, for the research,
development and testing of bioacoustic classification models, including the reproduction of the results reported in [1].
For details on the hardware of ROBIN recording units, we refer the reader to [2].
[1] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[2] J. Salamon, J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck, and S. Kelling. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS One, 2016.
Data Files
------------
The BirdVox-full-night_full-night-audio folder contains the recordings as FLAC files, sampled at 24 kHz, with a single channel (mono).
Metadata Files
--------------
The BirdVox-full-night_annotations folder contains JAMS files, where each row correspond to a different location in the time frequency domain (columns "Center Time (s)" and "Center Freq (Hz)").
/!\ CAUTION: in addition to the 35402 flight calls, Andrew Farnsworth pinpointed 29 artificial beeps produced by the recording device itself. These beeps are labeled as "alarm" instead of "flight call". For collecting positive samples for avian flight call detection, make sure you filter out the rows corresponding to alarms.
The approximate GPS coordinates of the sensors (latitudes and longitudes rounded to 2 decimal points) and UTC timestamps corresponding to the start of the recording for each sensor are included as CSV files in the main directory.
Please acknowledge BirdVox-full-night in academic research
--------------------------------------------------------------------------
When BirdVox-full-night is used for academic research, we would highly appreciate it if scientific publications of works partly based on this dataset cite the following publication:
V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
The creation of this dataset was supported by NSF grants 1125098 (BIRDCAST) and 1633259 (BIRDVOX), a Google Faculty Award, the Leon Levy Foundation, and two anonymous donors.
Conditions of Use
----------------------
Dataset created by Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, and Juan Pablo Bello.
The BirdVox-full-night dataset is offered free of charge under the terms of the Creative Commons CC0 1.0 Universal License:
https://creativecommons.org/publicdomain/zero/1.0/
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, CLO is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the BirdVox-full-night dataset or any part of it.
Feedback
-----------
Please help us improve BirdVox-full-night by sending your feedback to:
vincent.lostanlen@gmail.com and af27@cornell.edu
In case of a problem, please include as many details as possible.
Acknowledgements
------------------------
Jessie Barry, Ian Davies, Tom Fredericks, Jeff Gerbracht, Sara Keen, Holger Klinck, Anne Klingensmith, Ray Mack, Peter Marchetto, Ed Moore, Matt Robbins, Ken Rosenberg, and Chris Tessaglia-Hymes.
We acknowledge that the land on which the data was collected is the unceded territory of the Cayuga nation, which is part of the Haudenosaunee (Iroquois) confederacy.
Flapping flight is the most power-demanding mode of locomotion, associated with a suite of anatomical specializations in extant adult birds. In contrast, many developing birds use their forelimbs to negotiate environments long before acquiring “flight adaptations,” recruiting their developing wings to continuously enhance leg performance and, in some cases, fly. How does anatomical development influence these locomotor behaviors? Isolating morphological contributions to wing performance is extremely challenging using purely empirical approaches. However, musculoskeletal modeling and simulation techniques can incorporate empirical data to explicitly examine the functional consequences of changing morphology by manipulating anatomical parameters individually and estimating their effects on locomotion. To assess how ontogenetic changes in anatomy affect locomotor capacity, we combined existing empirical data on muscle morphology, skeletal kinematics, and aerodynamic force production with advanced biomechanical modeling and simulation techniques to analyze the ontogeny of pectoral limb function in a precocial ground bird (Alectoris chukar). Simulations of wing-assisted incline running (WAIR) using these newly developed musculoskeletal models collectively suggest that immature birds have excess muscle capacity and are limited more by feather morphology, possibly because feathers grow more quickly and have a different style of growth than bones and muscles. These results provide critical information about the ontogeny and evolution of avian locomotion by (i) establishing how muscular and aerodynamic forces interface with the skeletal system to generate movement in morphing juvenile birds, and (ii) providing a benchmark to inform biomechanical modeling and simulation of other locomotor behaviors, both across extant species and among extinct theropod dinosaurs.
The DBRD (pronounced dee-bird) dataset contains over 110k book reviews along with associated binary sentiment polarity labels. It is greatly influenced by the Large Movie Review Dataset and intended as a benchmark for sentiment classification in Dutch.
The TrajNet Challenge represents a large multi-scenario forecasting benchmark. The challenge consists on predicting 3161 human trajectories, observing for each trajectory 8 consecutive ground-truth values (3.2 seconds) i.e., t−7,t−6,…,t, in world plane coordinates (the so-called world plane Human-Human protocol) and forecasting the following 12 (4.8 seconds), i.e., t+1,…,t+12. The 8-12-value protocol is consistent with the most trajectory forecasting approaches, usually focused on the 5-dataset ETH-univ + ETH-hotel + UCY-zara01 + UCY-zara02 + UCY-univ. Trajnet extends substantially the 5-dataset scenario by diversifying the training data, thus stressing the flexibility and generalization one approach has to exhibit when it comes to unseen scenery/situations. In fact, TrajNet is a superset of diverse datasets that requires to train on four families of trajectories, namely 1) BIWI Hotel (orthogonal bird’s eye flight view, moving people), 2) Crowds UCY (3 datasets, tilted bird’s eye view, camera mounted on building or utility poles, moving people), 3) MOT PETS (multisensor, different human activities) and 4) Stanford Drone Dataset (8 scenes, high orthogonal bird’s eye flight view, different agents as people, cars etc. ), for a total of 11448 trajectories. Testing is requested on diverse partitions of BIWI Hotel, Crowds UCY, Stanford Drone Dataset, and is evaluated by a specific server (ground-truth testing data is unavailable for applicants).
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
BEANS-Zero
Version: 0.1.0 Created on: 2025-04-12 Creators:
Earth Species Project (https://www.earthspecies.org)
Overview
BEANS-Zero is a bioacoustics benchmark designed to evaluate multimodal audio-language models in zero-shot settings. Introduced in the paper NatureLM-audio paper (Robinson et al., 2025), it brings together tasks from both existing datasets and newly curated resources. The benchmark focuses on models that take a bioacoustic audio input (e.g., bird or… See the full description on the dataset page: https://huggingface.co/datasets/EarthSpeciesProject/BEANS-Zero.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. Search string and database searches. List of benchmark articles, search string development, and databases included in the search.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
link to original dataset: https://bird-bench.github.io/ Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N. and Zhou, X., 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36.