Overview With extensive experience in speech recognition, Nexdata has resource pool covering more than 50 countries and regions. Our linguist team works closely with clients to assist them with dictionary and text corpus construction, speech quality inspection, linguistics consulting and etc.
Our Capacity -Global Resources: Global resources covering hundreds of languages worldwide
-Compliance: All the Machine Learning (ML) Data are collected with proper authorization -Quality: Multiple rounds of quality inspections ensures high quality data output
-Secure Implementation: NDA is signed to gurantee secure implementation and Machine Learning (ML) Data is destroyed upon delivery.
Nexdata is equipped with professional recording equipment and has resources pool of 70+ countries and regions, and provide various types of speech recognition data collection services for Machine Learning (ML) Data.
Nexdata provides multi-language, multi-timbre, multi-domain and multi-style speech synthesis data collection servicesfor Deep Learning Data.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on Canadian accents and dialects.
With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in Canada.
Speech Data:This training dataset comprises 30 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 40 native English speakers from different states/provinces of Canada. This collaborative effort guarantees a balanced representation of Canadian accents, dialects, and demographics, reducing biases and promoting inclusivity.
Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.
Metadata:In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.
Transcription:This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.
Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.
Updates and Customization:We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.
If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.
License:This audio dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Wolof Audio Dataset
The Wolof Audio Dataset is a collection of audio recordings and their corresponding transcriptions in Wolof. This dataset is designed to support the development of Automatic Speech Recognition (ASR) models for the Wolof language. It was created by combining three existing datasets:
ALFFA: Available at serge-wilson/wolof_speech_transcription FLEURS: Available at vonewman/fleurs-wolof-dataset Urban Bus Wolof Speech Dataset: Available at vonewman/urban-bus-wolof… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/wolof-audio-data.
Welcome to the Japanese Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of Japanese language speech recognition models, with a particular focus on Japan accents and dialects.
With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the Japanese language spoken in Japan.
Speech Data:This training dataset comprises 50 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 70 native Japanese speakers from different states/provinces of Japan. This collaborative effort guarantees a balanced representation of Japan accents, dialects, and demographics, reducing biases and promoting inclusivity.
Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.
Metadata:In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Japanese language speech recognition models.
Transcription:This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.
Our goal is to expedite the deployment of Japanese language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.
Updates and Customization:We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.
If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.
License:This audio dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such a system provides a possibility to collect bird observations on a scale that no human observer could ever match. During the last decades progress has been made in the field of automatic bird sound recognition, but recognizing bird species from untargeted soundscape recordings remains a challenge. In this article we demonstrate the workflow for building a global identification model and adjusting it to perform well on the data of autonomous recorders from a specific region. We show how data augmentation and a combination of global and local data can be used to train a convolutional neural network to classify vocalizations of 101 bird species. We construct a model and train it with a global data set to obtain a base model. The base model is then fine-tuned with local data from Southern Finland in order to adapt it to the sound environment of a specific location and tested with two data sets: one originating from the same Southern Finnish region and another originating from a different region in German Alps. Our results suggest that fine-tuning with local data significantly improves the network performance. Classification accuracy was improved for test recordings from the same area as the local training data (Southern Finland) but not for recordings from a different region (German Alps). Data augmentation enables training with a limited number of training data and even with few local data samples significant improvement over the base model can be achieved. Our model outperforms the current state-of-the-art tool for automatic bird sound classification. Using local data to adjust the recognition model for the target domain leads to improvement over general non-tailored solutions. The process introduced in this article can be applied to build a fine-tuned bird sound classification model for a specific environment. Methods This repository contains data and recognition models described in paper Domain-specific neural networks improve automated bird sound recognition already with small amount of local data. (Lauha et al., 2022).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This CNVVE Dataset contains clean audio samples encompassing six distinct classes of voice expressions, namely “Uh-huh” or “mm-hmm”, “Uh-uh” or “mm-mm”, “Hush” or “Shh”, “Psst”, “Ahem”, and Continuous humming, e.g., “hmmm.” Audio samples of each class are found in the respective folders. These audio samples have undergone a thorough cleaning process. The raw samples are published in https://doi.org/10.18419/darus-3897. Initially, we applied the Google WebRTC voice activity detection (VAD) algorithm on the given audio files to remove noise or silence from the collected voice signals. The intensity was set to "2", which could be a value between "1" and "3". However, because of variations in the data, some files required additional manual cleaning. These outliers, characterized by sharp click sounds (such as those occurring at the end of recordings), were addressed. The samples are recorded through a dedicated website for data collection that defines the purpose and type of voice data by providing example recordings to participants as well as the expressions’ written equivalent, e.g., “Uh-huh”. Audio recordings were automatically saved in the .wav format and kept anonymous, with a sampling rate of 48 kHz and a bit depth of 32 bits. For more info, please check the paper or feel free to contact the authors for any inquiries.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The global speech and audio data market is experiencing robust growth, driven by the increasing adoption of voice assistants, the proliferation of smart devices, and the expanding use of speech analytics in various sectors. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033, reaching an estimated $50 billion by 2033. Key drivers include advancements in artificial intelligence (AI), particularly in natural language processing (NLP) and machine learning (ML), which are enhancing the accuracy and efficiency of speech recognition and analysis. Furthermore, the growing demand for personalized user experiences, coupled with the rise of multilingual applications, is fueling market expansion. The market is segmented by language (Chinese Mandarin, English, Spanish, French, and Others) and application (Commercial Use and Academic Use). Commercial applications, including customer service, market research, and healthcare, currently dominate, but the academic sector is showing significant growth potential as research into speech technology advances. Geographic distribution shows North America and Europe currently holding the largest market shares, but the Asia-Pacific region is expected to experience the fastest growth in the coming years, fueled by increasing smartphone penetration and digitalization in emerging economies like India and China. Restraints include data privacy concerns, the need for high-quality data collection, and the challenges associated with handling diverse accents and dialects. The competitive landscape is characterized by a mix of large technology companies like Google, Amazon, and Microsoft, and specialized speech technology providers such as Nuance and VoiceBase. These companies are engaged in intense R&D to improve the accuracy and performance of speech recognition and synthesis technologies. Strategic partnerships and acquisitions are expected to shape the market further, as companies seek to expand their product portfolios and geographic reach. The ongoing innovation in speech-to-text and text-to-speech technologies, alongside the integration of speech data with other data types (like text and image data), will unlock new applications and further accelerate market growth. The demand for real-time transcription and translation services is also contributing to this upward trend, driving investment in innovative solutions and pushing the boundaries of what’s possible with speech and audio data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The English Deep South Media Audio Dataset project is designed to develop a comprehensive audio dataset focusing on the unique accents and dialects of the English Deep South.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The AXIOM Voice Dataset has the main purpose of gathering audio recordings from Italian natural language speakers. This voice data collection intended to obtain audio reconding sample for the training and testing of VIMAR algorithm implemented for the Smart Home scenario for the Axiom board. The final goal was to developing an efficient voice recognition system using machine learning algorithms. A team of UX researchers of the University of Siena collected data for five months and tested the voice recognition system on the AXIOM board [1]. The data acquisition process involved natural Italian speakers who provided their written consent to participate in the research project. The participants were selected in order to maintain a cluster with different characteristics in gender, age, region of origin and background.
Welcome to the Polish Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of Polish language speech recognition models, with a particular focus on Poland accents and dialects.
With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the Polish language spoken in Poland.
Speech Data:This training dataset comprises 50 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 70 native Polish speakers from different states/provinces of Poland. This collaborative effort guarantees a balanced representation of Poland accents, dialects, and demographics, reducing biases and promoting inclusivity.
Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.
Metadata:In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Polish language speech recognition models.
Transcription:This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.
Our goal is to expedite the deployment of Polish language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.
Updates and Customization:We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.
If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.
License:This audio dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.
FSD50K is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
Citation
If you use the FSD50K dataset, or part of it, please cite our TASLP paper (available from [arXiv] [TASLP]):
@article{fonseca2022FSD50K, title={{FSD50K}: an open dataset of human-labeled sound events}, author={Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={30}, pages={829--852}, year={2022}, publisher={IEEE} }
Paper update: This paper has been published in TASLP at the beginning of 2022. The accepted camera-ready version includes a number of improvements with respect to the initial submission. The main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. The TASLP-accepted camera-ready version is available from arXiv (in particular, it is v2 in arXiv, displayed by default).
Data curators
Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez
Contact
You are welcome to contact Eduardo Fonseca should you have any questions, at efonseca@google.com.
ABOUT FSD50K
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology [1]. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
What follows is a brief summary of FSD50K's most important characteristics. Please have a look at our paper (especially Section 4) to extend the basic information provided here with relevant details for its usage, as well as discussion, limitations, applications and more.
Basic characteristics:
vocabulary.csv
(see Files section below).Dev set:
Eval set:
Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal
, Human group actions
, Human voice
, Respiratory sounds
, and Domestic sounds, home sounds
.
LICENSE
All audio clips in FSD50K are released under Creative Commons (CC) licenses. Each clip has its own license as defined by the clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. Specifically:
The development set consists of 40,966 clips with the following licenses:
The evaluation set consists of 10,231 clips with the following licenses:
For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses. The licenses are specified in the files dev_clips_info_FSD50K.json
and eval_clips_info_FSD50K.json
.
In addition, FSD50K as a whole is the result of a curation process and it has an additional license: FSD50K is released under CC-BY. This license is specified in the LICENSE-DATASET
file downloaded with the FSD50K.doc
zip file. We note that the choice of one license for the dataset as a whole is not straightforward as it comprises items with different licenses (such as audio clips, annotations, or data split). The choice of a global license in these cases may warrant further investigation (e.g., by someone with a background in copyright law).
Usage of FSD50K for commercial purposes:
If you'd like to use FSD50K for commercial purposes, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
Also, if you are interested in using FSD50K for machine learning competitions, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
FILES
FSD50K can be downloaded as a series of zip files with the following directory structure:
root │ └───FSD50K.dev_audio/ Audio clips in the dev set │ └───FSD50K.eval_audio/ Audio clips in the eval set │ └───FSD50K.ground_truth/ Files for FSD50K's ground truth │ │ │ └─── dev.csv Ground truth for the dev set │ │ │ └─── eval.csv Ground truth for the eval set │ │ │ └─── vocabulary.csv List of 200 sound classes in FSD50K │ └───FSD50K.metadata/ Files for additional metadata │ │ │ └─── class_info_FSD50K.json Metadata about the sound classes │ │ │ └─── dev_clips_info_FSD50K.json Metadata about the dev clips │ │ │ └─── eval_clips_info_FSD50K.json Metadata about the eval clips │ │ │ └─── pp_pnp_ratings_FSD50K.json PP/PNP ratings │ │ │ └─── collection/ Files for the *sound collection* format │ └───FSD50K.doc/ │ └───README.md The dataset description file that you are reading │ └───LICENSE-DATASET License of the FSD50K dataset as an entity
Each row (i.e. audio clip) of dev.csv
contains the following information:
fname
: the file name without the .wav
extension, e.g., the fname 64760
corresponds to the file 64760.wav
in disk. This number is the Freesound id. We always use Freesound ids as filenames.labels
: the class labels (i.e., the ground truth). Note these
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator’s Location Sentiment Data for Uganda
Techsalerator’s Location Sentiment Data for Uganda offers an extensive collection of data that is crucial for businesses, researchers, and technology developers. This dataset provides deep insights into public sentiment across various locations in Uganda, enabling data-driven decision-making for development, marketing, and social research.
For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.
Techsalerator’s Location Sentiment Data for Uganda delivers a comprehensive analysis of public sentiment across urban, rural, and industrial locations. This dataset is essential for businesses, government agencies, and researchers looking to understand the sentiment trends in different regions of Uganda.
To obtain Techsalerator’s Location Sentiment Data for Uganda, contact info@techsalerator.com with your specific requirements. Techsalerator offers customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.
For deep insights into public sentiment across Uganda, Techsalerator’s dataset is an invaluable resource for businesses, policymakers, and researchers.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for data collection and labelling was estimated at USD 1.3 billion in 2023, with forecasts predicting it will reach approximately USD 7.8 billion by 2032, showcasing a robust CAGR of 20.8% during the forecast period. Several factors are driving this significant growth, including the rising adoption of artificial intelligence (AI) and machine learning (ML) across various industries, the increasing demand for high-quality annotated data, and the proliferation of data-driven decision-making processes.
One of the primary growth factors in the data collection and labelling market is the rapid advancement and integration of AI and ML technologies across various industry verticals. These technologies require vast amounts of accurately annotated data to train algorithms and improve their accuracy and efficiency. As AI and ML applications become more prevalent in sectors such as healthcare, automotive, and retail, the demand for high-quality labelled data is expected to grow exponentially. Furthermore, the increasing need for automation and the ability to extract valuable insights from large datasets are driving the adoption of data labelling services.
Another significant factor contributing to the market's growth is the rising focus on enhancing customer experiences and personalisation. Companies are leveraging data collection and labelling to gain deeper insights into customer behaviour, preferences, and trends. This enables them to develop more targeted marketing strategies, improve product recommendations, and deliver personalised services. As businesses strive to stay competitive in a rapidly evolving digital landscape, the demand for accurate and comprehensive data labelling solutions is expected to rise.
The growing importance of data privacy and security is also playing a crucial role in driving the data collection and labelling market. With the implementation of stringent data protection regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), organisations are increasingly focusing on ensuring the accuracy and integrity of their data. This has led to a greater emphasis on data labelling processes, as they help maintain data quality and compliance with regulatory requirements. Additionally, the rising awareness of the potential risks associated with biased or inaccurate data is further propelling the demand for reliable data labelling services.
Regionally, North America is expected to dominate the data collection and labelling market during the forecast period. The region's strong technological infrastructure, high adoption rate of AI and ML technologies, and the presence of major market players contribute to its leading position. Additionally, the Asia Pacific region is anticipated to witness significant growth, driven by the increasing investments in AI and ML technologies, the expanding IT and telecommunications sector, and the growing focus on digital transformation in countries such as China, India, and Japan. Europe is also expected to experience steady growth, supported by the rising adoption of AI-driven applications across various industries and the implementation of data protection regulations.
The data collection and labelling market can be segmented by data type into text, image/video, and audio. Each type has its unique applications and demands, creating diverse opportunities and challenges within the market. Text data labelling is particularly crucial for natural language processing (NLP) applications, such as chatbots, sentiment analysis, and language translation. The growing adoption of NLP technologies across various industries, including healthcare, finance, and customer service, is driving the demand for high-quality text data labelling services.
Image and video data labelling is essential for computer vision applications, such as facial recognition, object detection, and autonomous vehicles. The increasing deployment of these technologies in industries such as automotive, retail, and surveillance is fuelling the demand for accurate image and video annotation. Additionally, the growing popularity of augmented reality (AR) and virtual reality (VR) applications is further contributing to the demand for labelled image and video data. The rising need for real-time video analytics and the development of advanced visual search engines are also driving the growth of this segment.
Audio data labelling is critical for speech recognition and audio analysis appli
These are audio recordings taken by an Eclipse Soundscapes (ES) Data Collector during the week of the April 08, 2024 Total Solar Eclipse.
Data Site location information:
Latitude: 36.089254
Longitude: -92.54893
Type of Eclipse: Total Solar Eclipse
Eclipse %: 100
WAV files Time & Date Settings: Set with Automated AudioMoth Time chime
Included Data:
Audio files in WAV format with the date and time in UTC within the file name: YYYYMMDD_HHMMSS meaning YearMonthDay_HourMinuteSecondFor example, 20240411_141600.WAV means that this audio file starts on April 11, 2024 at 14:16:00 Coordinated Universal Time (UTC)
CONFIG Text file: Includes AudioMoth device setting information, such as sample rate in Hertz (Hz), gain, firmware, etc.
Eclipse Information for this location:
Eclipse Date: 04/08/2024
Eclipse Start Time (UTC): 17:35:18
Totality Start Time (UTC): [N/A if partial eclipse] 18:52:19
Eclipse Maximum when the most possible amount of the Sun in blocked: 18:54:05
Totality End Time (UTC): [N/A if partial eclipse] 18:55:51
Eclipse End Time (UTC): [N/A if partial eclipse] 20:12:14
Audio Data Collection During Eclipse Week
ES Data Collectors used AudioMoth devices to record audio data, known as soundscapes, over a 5-day period during the eclipse week: 2 days before the eclipse, the day of the eclipse, and 2 days after. The complete raw audio data collected by the Data Collector at the location mentioned above is provided here. This data may or may not cover the entire requested timeframe due to factors such as availability, technical issues, or other unforeseen circumstances.
ES ID# Information:
Each AudioMoth recording device was assigned a unique Eclipse Soundscapes Identification Number (ES ID#). This identifier connects the audio data, submitted via a MicroSD card, with the latitude and longitude information provided by the data collector through an online form. The ES team used the ES ID# to link the audio data with its corresponding location information and then uploaded this raw audio data and location details to Zenodo. This process ensures the anonymity of the ES Data Collectors while allowing them to easily search for and access their audio data on Zenodo.
TimeStamp Information:
The ES team and the Data Collectors took care to set the date and time on the AudioMoth recording devices using an AudioMoth time chime before deployment, ensuring that the recordings would have an automatic timestamp. However, participants also manually noted the date and start time as a backup in case the time chime setup failed. The notes above indicate whether the WAV audio files for this site were timestamped manually or with the automated AudioMoth time chime.
Common Timestamp Error:
Some AudioMoth devices experienced a malfunction where the timestamp on audio files reverted to a date in 1970 or before, even after initially recording correctly. Despite this issue, the affected data was still included in this ES site’s collected raw audio dataset.
Latitude & Longitude Information:
The latitude and longitude for each site was taken manually by data collectors and submitted to the ES team, either via a web form or on paper. It is shared in Decimal Degrees format.
General Project Information:
The Eclipse Soundscapes Project is a NASA Volunteer Science project funded by NASA Science Activation that is studying how eclipses affect life on Earth during the October 14, 2023 annular solar eclipse and the April 8, 2024 total solar eclipse. Eclipse Soundscapes revisits an eclipse study from almost 100 years ago that showed that animals and insects are affected by solar eclipses! Like this study from 100 years ago, ES asked for the public's help. ES uses modern technology to continue to study how solar eclipses affect life on Earth! You can learn more at www.EclipseSoundscapes.org.
Eclipse Soundscapes is an enterprise of ARISA Lab, LLC and is supported by NASA award No. 80NSSC21M0008.
Eclipse Data Version Definitions
{1st digit = year, 2nd digit = Eclipse type (1=Total Solar Eclipse, 9=Annular Solar Eclipse, 0=Partial Solar Eclipse), 3rd digit is unused and in place for future use}
2023.9.0 = Week of October 14, 2023 Annular Eclipse Audio Data, Path of Annularity (Annular Eclipse)
2023.0.0 = Week of October 14, 2023 Annular Eclipse Audio Data, OFF the Path of Annularity (Partial Eclipse)
2024.1.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data, Path of Totality (Total Solar Eclipse)
2024.0.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data , OFF the Path of Totality (Partial Solar Eclipse)
*Please note that this dataset's version number is listed below.
Individual Site Citation: APA Citation (7th edition)
ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientist. (2025). 2024 solar eclipse soundscapes audio data [Audio dataset, ES ID# 504]. Zenodo.{Insert DOI}Collected by volunteer scientists as part of the Eclipse Soundscapes Project.This project is supported by NASA award No. 80NSSC21M0008.
Eclipse Community Citation
ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientists. 2023 and 2024 solar eclipse soundscapes audio data [Collection of audio datasets]. Eclipse Soundscapes Community, Zenodo. https://zenodo.org/communities/eclipsesoundscapes/Collected by volunteer scientists as part of the Eclipse Soundscapes Project.This project is supported by NASA award No. 80NSSC21M0008.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are audio recordings taken by an Eclipse Soundscapes (ES) Data Collector during the week of the April 08, 2024 Total Solar Eclipse.
Data Site location information:
Latitude: 34.53804
Longitude: -93.03621
Type of Eclipse: Total Solar Eclipse
Eclipse %: 100
WAV files Time & Date Settings: Set with Automated AudioMoth Time chime
Included Data:
Audio files in WAV format with the date and time in UTC within the file name: YYYYMMDD_HHMMSS meaning YearMonthDay_HourMinuteSecondFor example, 20240411_141600.WAV means that this audio file starts on April 11, 2024 at 14:16:00 Coordinated Universal Time (UTC)
CONFIG Text file: Includes AudioMoth device setting information, such as sample rate in Hertz (Hz), gain, firmware, etc.
Eclipse Information for this location:
Eclipse Date: 04/08/2024
Eclipse Start Time (UTC): 17:31:56
Totality Start Time (UTC): [N/A if partial eclipse] 18:49:24
Eclipse Maximum when the most possible amount of the Sun in blocked: 18:51:15
Totality End Time (UTC): [N/A if partial eclipse] 18:53:05
Eclipse End Time (UTC): [N/A if partial eclipse] 20:10:10
Audio Data Collection During Eclipse Week
ES Data Collectors used AudioMoth devices to record audio data, known as soundscapes, over a 5-day period during the eclipse week: 2 days before the eclipse, the day of the eclipse, and 2 days after. The complete raw audio data collected by the Data Collector at the location mentioned above is provided here. This data may or may not cover the entire requested timeframe due to factors such as availability, technical issues, or other unforeseen circumstances.
ES ID# Information:
Each AudioMoth recording device was assigned a unique Eclipse Soundscapes Identification Number (ES ID#). This identifier connects the audio data, submitted via a MicroSD card, with the latitude and longitude information provided by the data collector through an online form. The ES team used the ES ID# to link the audio data with its corresponding location information and then uploaded this raw audio data and location details to Zenodo. This process ensures the anonymity of the ES Data Collectors while allowing them to easily search for and access their audio data on Zenodo.
TimeStamp Information:
The ES team and the Data Collectors took care to set the date and time on the AudioMoth recording devices using an AudioMoth time chime before deployment, ensuring that the recordings would have an automatic timestamp. However, participants also manually noted the date and start time as a backup in case the time chime setup failed. The notes above indicate whether the WAV audio files for this site were timestamped manually or with the automated AudioMoth time chime.
Common Timestamp Error:
Some AudioMoth devices experienced a malfunction where the timestamp on audio files reverted to a date in 1970 or before, even after initially recording correctly. Despite this issue, the affected data was still included in this ES site’s collected raw audio dataset.
Latitude & Longitude Information:
The latitude and longitude for each site was taken manually by data collectors and submitted to the ES team, either via a web form or on paper. It is shared in Decimal Degrees format.
General Project Information:
The Eclipse Soundscapes Project is a NASA Volunteer Science project funded by NASA Science Activation that is studying how eclipses affect life on Earth during the October 14, 2023 annular solar eclipse and the April 8, 2024 total solar eclipse. Eclipse Soundscapes revisits an eclipse study from almost 100 years ago that showed that animals and insects are affected by solar eclipses! Like this study from 100 years ago, ES asked for the public's help. ES uses modern technology to continue to study how solar eclipses affect life on Earth! You can learn more at www.EclipseSoundscapes.org.
Eclipse Soundscapes is an enterprise of ARISA Lab, LLC and is supported by NASA award No. 80NSSC21M0008.
Eclipse Data Version Definitions
{1st digit = year, 2nd digit = Eclipse type (1=Total Solar Eclipse, 9=Annular Solar Eclipse, 0=Partial Solar Eclipse), 3rd digit is unused and in place for future use}
2023.9.0 = Week of October 14, 2023 Annular Eclipse Audio Data, Path of Annularity (Annular Eclipse)
2023.0.0 = Week of October 14, 2023 Annular Eclipse Audio Data, OFF the Path of Annularity (Partial Eclipse)
2024.1.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data, Path of Totality (Total Solar Eclipse)
2024.0.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data , OFF the Path of Totality (Partial Solar Eclipse)
*Please note that this dataset's version number is listed below.
Individual Site Citation: APA Citation (7th edition)
ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientist. (2025). 2024 solar eclipse soundscapes audio data [Audio dataset, ES ID# 001]. Zenodo.{Insert DOI}. Collected by volunteer scientists as part of the Eclipse Soundscapes Project. This project is supported by NASA award No. 80NSSC21M0008.
Eclipse Community Citation
ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientists. 2023 and 2024 solar eclipse soundscapes audio data [Collection of audio datasets]. Eclipse Soundscapes Community, Zenodo. https://zenodo.org/communities/eclipsesoundscapes/. Collected by volunteer scientists as part of the Eclipse Soundscapes Project. This project is supported by NASA award No. 80NSSC21M0008.
Techsalerator’s Location Sentiment Data for Palestine State
Techsalerator’s Location Sentiment Data for Palestine State offers a detailed collection of insights vital for businesses, researchers, and technology developers. This dataset provides in-depth information about the emotional sentiment across different regions, capturing the mood and opinions of people in various environments within Palestine State.
For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.
Techsalerator’s Location Sentiment Data for Palestine State delivers a thorough analysis of sentiment across urban, rural, and industrial locations. This dataset is crucial for AI development, social studies, marketing strategies, and telecommunications.
To obtain Techsalerator’s Location Sentiment Data for Palestine State, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.
For in-depth insights into sentiment trends and regional opinions in Palestine State, Techsalerator’s dataset is an invaluable resource for researchers, policymakers, marketers, and urban developers.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator’s Location Sentiment Data for Vanuatu
Techsalerator’s Location Sentiment Data for Vanuatu provides a detailed collection of data, offering crucial insights for businesses, researchers, and technology developers. This dataset delivers a comprehensive analysis of public sentiment and environmental conditions across different regions of Vanuatu, helping to understand local opinions, behaviors, and perceptions.
For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.
Techsalerator’s Location Sentiment Data for Vanuatu offers an in-depth analysis of public sentiment across urban, rural, and remote locations. This data is essential for market research, tourism development, social studies, and governmental decision-making.
To obtain Techsalerator’s Location Sentiment Data for Vanuatu, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.
Techsalerator’s dataset is an invaluable resource for businesses, governments, and researchers seeking to understand public sentiment in Vanuatu. It provides actionable insights for decision-making, policy development, and market strategies.
Overview
Broadcast Audio Fingerprinting dataset is an open, available upon request, annotated dataset for the task of music monitoring in broadcast. It contains 2,000 tracks from Epidemic Sound's private catalogue as reference tracks that represent 74 hours. As queries, it contains over 57 hours of TV broadcast audio from 23 countries and 203 channels distributed with 3,425 one-min audio excerpts.
It has been annotated by six annotators in total and each query has been cross-annotated by three of them obtaining high inter-annotator agreement percentages, which validates the annotation methodology and ensures the reliability of the annotations.
Purpose of the dataset
This dataset aims to become the standard dataset to evaluate Audio Fingerprinting algorithms since it’s built on real data, without the use of any data-augmentation techniques. It is also the first dataset to address background music fingerprinting, which is a real problem in royalties distribution.
Dataset use
This dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis.
About the data
All audio files are monophonic, 8kHz, 128kb/s, pcm_s16le encoded in .wav. Annotations mark which tracks sound (either in foreground or background) in each query (if any) and also the specific times where it starts and ends sound in the query.
Note that there are 88 queries that do not have any matches.
For more information check the dedicated Github repository: https://github.com/guillemcortes/baf-dataset and the dataset datasheet included in the files.
Dataset contents
The dataset is structured following this schema
baf-dataset/
├── baf_datasheet.pdf
├── annotations.csv
├── changelog.md
├── cross_annotations.csv
├── queries_info.csv
├── queries
│ ├── query_0001.wav
│ ├── query_0002.wav
│ ├── …
│ └── query_3425.wav
├── queries_info.csv
└── references
├── ref_0001.wav
├── ref_0002.wav
├── …
└── ref_2000.wav
There are two folders named queries and references containing the wav files of TV broadcast recordings and the reference tracks, respectively.
annotations.csv file contains the annotations made by the 6 annotators, giving the following information:
query | reference | query_start | query_end | annotator |
---|---|---|---|---|
query_0692.wav | ref_1235.wav | 0.0 | 59.904 | annotator_6 |
cross_annotations.csv contains the resulting annotations after merging the overlapping annotations in annotations.csv file. x_tag has three different values:
single: the segment has only been annotated by one annotator.
majority: the segment has been annotated by two annotators.
unanimity: the segment has been annotated by the three annotators.
query | reference | query_Start | query_end | annotators | x_tag |
---|---|---|---|---|---|
query_0693.wav | ref_1834.wav | 37.53 | 38.07 | ['annotator_3'] | single |
query_0693.wav | ref_1834.wav | 18.18 | 37.48 | ['annotator_3', 'annotator_5', 'annotator_3'] | unanimity |
query_0693.wav | ref_1834.wav | 37.48 | 37.53 | ['annotator_5', 'annotator_3'] | majority |
queries_info.csv contains information about the queries as a citation reference. It contains the country, the channel and the date where the broadcast happened.
filename | country | channel | datetime |
---|---|---|---|
query_0001.wav | Norway | Discovery Channel | 2021-02-26 14:45:26 |
changelog.md contains a curated, chronologically ordered list of notable changes for each version of the dataset.
baf_datasheet.pdf contains standardized documentation for datasets
Ownership of the data
Next, we specify the ownership of all the data included in BAF: Broadcast Audio Fingerprinting dataset. For licensing information, please refer to the “License” section.
Reference tracks
The reference tracks are owned by Epidemic Sound AB, which has given a worldwide, revocable, non-exclusive, royalty-free licence to use and reproduce this data collection consisting of 2,000 low-quality monophonic 8kHz downsampled audio recordings.
Query tracks
The query tracks come from publicly available TV broadcast emissions so the ownership of each recording belongs to the channel that emitted the content. We publish them under the right of quotation provided by the Berne Convention.
Annotations
Guillem Cortès together with Alex Ciurana and Emilio Molina from BMAT Music Licensing S.L. have managed the annotation therefore the annotations belong to BMAT.
Accessing the dataset
The dataset is available upon request. Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset. Bear in mind that this information is important for the evaluation of every access request.
License
This dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis. Given the different ownership of the elements of the dataset, the dataset is licensed under the following conditions:
User’s access request
Research only, non-commercial purposes
No adaptations nor derivative works
Attribution to Epidemic Sound and the authors as it is indicated in the ”citation” section.
Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset.
Acknowledgments
With the support of Ministerio de Ciencia Innovación y universidades through Retos-Colaboración call, reference: RTC2019-007248-7, and also with the support of the Industrial Doctorates Plan of the Secretariat of Universities and Research of the Department of Business and Knowledge of the Generalitat de Catalunya. Reference: DI46-2020.
Overview With extensive experience in speech recognition, Nexdata has resource pool covering more than 50 countries and regions. Our linguist team works closely with clients to assist them with dictionary and text corpus construction, speech quality inspection, linguistics consulting and etc.
Our Capacity -Global Resources: Global resources covering hundreds of languages worldwide
-Compliance: All the Machine Learning (ML) Data are collected with proper authorization -Quality: Multiple rounds of quality inspections ensures high quality data output
-Secure Implementation: NDA is signed to gurantee secure implementation and Machine Learning (ML) Data is destroyed upon delivery.