100+ datasets found
  1. Speech Recognition Data Collection Services | 100+ Languages Resources...

    • datarade.ai
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-speech-recognition-data-collection-services-100-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 28, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Estonia, Haiti, Cambodia, Brazil, Malaysia, Sri Lanka, United Kingdom, Lithuania, Austria, El Salvador
    Description
    1. Overview With extensive experience in speech recognition, Nexdata has resource pool covering more than 50 countries and regions. Our linguist team works closely with clients to assist them with dictionary and text corpus construction, speech quality inspection, linguistics consulting and etc.

    2. Our Capacity -Global Resources: Global resources covering hundreds of languages worldwide

    -Compliance: All the Machine Learning (ML) Data are collected with proper authorization -Quality: Multiple rounds of quality inspections ensures high quality data output

    -Secure Implementation: NDA is signed to gurantee secure implementation and Machine Learning (ML) Data is destroyed upon delivery.

    1. About Nexdata Nexdata is equipped with professional Machine Learning (ML) Data collection devices, tools and environments, as well as experienced project managers in data collection and quality control, so that we can meet the data collection requirements in various scenarios and types. Please visit us at https://www.nexdata.ai/service/speech-recognition?source=Datarade
  2. Speech Recognition Data Collection Services | 100+ Languages Resources...

    • data.nexdata.ai
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://data.nexdata.ai/products/nexdata-speech-recognition-data-collection-services-100-nexdata
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Jordan, Finland, Cambodia, Luxembourg, Tunisia, Lebanon, Singapore, Netherlands, Mongolia, New Zealand
    Description

    Nexdata is equipped with professional recording equipment and has resources pool of 70+ countries and regions, and provide various types of speech recognition data collection services for Machine Learning (ML) Data.

  3. Speech Synthesis Data Collection Service | 50+ Languages Resources |...

    • data.nexdata.ai
    Updated Aug 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). Speech Synthesis Data Collection Service | 50+ Languages Resources | Numerous Voice Sample | TTS Data | Audio Data | Deep Learning (DL) Data [Dataset]. https://data.nexdata.ai/products/nexdata-speech-synthesis-data-collection-services-50-lan-nexdata
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Uruguay, Azerbaijan, French Guiana, Romania, Malaysia, Portugal, Singapore, Italy, Dominican Republic, Mexico
    Description

    Nexdata provides multi-language, multi-timbre, multi-domain and multi-style speech synthesis data collection servicesfor Deep Learning Data.

  4. F

    English (Canada) General Conversation Speech Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English (Canada) General Conversation Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada
    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on Canadian accents and dialects.

    With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in Canada.

    Speech Data:

    This training dataset comprises 30 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 40 native English speakers from different states/provinces of Canada. This collaborative effort guarantees a balanced representation of Canadian accents, dialects, and demographics, reducing biases and promoting inclusivity.

    Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.

    Metadata:

    In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.

    Transcription:

    This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.

    Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.

    Updates and Customization:

    We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.

    If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.

    License:

    This audio dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.

  5. h

    wolof-audio-data

    • huggingface.co
    Updated Dec 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdoulaye Diallo (2024). wolof-audio-data [Dataset]. https://huggingface.co/datasets/vonewman/wolof-audio-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2024
    Authors
    Abdoulaye Diallo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Wolof Audio Dataset

    The Wolof Audio Dataset is a collection of audio recordings and their corresponding transcriptions in Wolof. This dataset is designed to support the development of Automatic Speech Recognition (ASR) models for the Wolof language. It was created by combining three existing datasets:

    ALFFA: Available at serge-wilson/wolof_speech_transcription FLEURS: Available at vonewman/fleurs-wolof-dataset Urban Bus Wolof Speech Dataset: Available at vonewman/urban-bus-wolof… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/wolof-audio-data.

  6. F

    Japanese (Japan) General Conversation Speech Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Japanese (Japan) General Conversation Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-japanese-japan
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    Area covered
    Japan
    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the Japanese Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of Japanese language speech recognition models, with a particular focus on Japan accents and dialects.

    With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the Japanese language spoken in Japan.

    Speech Data:

    This training dataset comprises 50 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 70 native Japanese speakers from different states/provinces of Japan. This collaborative effort guarantees a balanced representation of Japan accents, dialects, and demographics, reducing biases and promoting inclusivity.

    Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.

    Metadata:

    In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Japanese language speech recognition models.

    Transcription:

    This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.

    Our goal is to expedite the deployment of Japanese language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.

    Updates and Customization:

    We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.

    If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.

    License:

    This audio dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.

  7. n

    Data from: Domain-specific neural networks improve automated bird sound...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Sep 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrik Lauha; Panu Somervuo; Petteri Lehikoinen; Lisa Geres; Tobias Richter; Sebastian Seibold; Otso Ovaskainen (2022). Domain-specific neural networks improve automated bird sound recognition already with small amount of local data [Dataset]. http://doi.org/10.5061/dryad.2bvq83btd
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    University of Jyväskylä
    Goethe University Frankfurt
    Technical University of Munich
    University of Helsinki
    Authors
    Patrik Lauha; Panu Somervuo; Petteri Lehikoinen; Lisa Geres; Tobias Richter; Sebastian Seibold; Otso Ovaskainen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such a system provides a possibility to collect bird observations on a scale that no human observer could ever match. During the last decades progress has been made in the field of automatic bird sound recognition, but recognizing bird species from untargeted soundscape recordings remains a challenge. In this article we demonstrate the workflow for building a global identification model and adjusting it to perform well on the data of autonomous recorders from a specific region. We show how data augmentation and a combination of global and local data can be used to train a convolutional neural network to classify vocalizations of 101 bird species. We construct a model and train it with a global data set to obtain a base model. The base model is then fine-tuned with local data from Southern Finland in order to adapt it to the sound environment of a specific location and tested with two data sets: one originating from the same Southern Finnish region and another originating from a different region in German Alps. Our results suggest that fine-tuning with local data significantly improves the network performance. Classification accuracy was improved for test recordings from the same area as the local training data (Southern Finland) but not for recordings from a different region (German Alps). Data augmentation enables training with a limited number of training data and even with few local data samples significant improvement over the base model can be achieved. Our model outperforms the current state-of-the-art tool for automatic bird sound classification. Using local data to adjust the recognition model for the target domain leads to improvement over general non-tailored solutions. The process introduced in this article can be applied to build a fine-tuned bird sound classification model for a specific environment. Methods This repository contains data and recognition models described in paper Domain-specific neural networks improve automated bird sound recognition already with small amount of local data. (Lauha et al., 2022).

  8. D

    CNVVE Dataset clean audio samples

    • darus.uni-stuttgart.de
    Updated Feb 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramin Hedeshy; Raphael Menges; Steffen Staab (2024). CNVVE Dataset clean audio samples [Dataset]. http://doi.org/10.18419/DARUS-3898
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    DaRUS
    Authors
    Ramin Hedeshy; Raphael Menges; Steffen Staab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    BMBF
    BMWK/ESF
    Description

    This CNVVE Dataset contains clean audio samples encompassing six distinct classes of voice expressions, namely “Uh-huh” or “mm-hmm”, “Uh-uh” or “mm-mm”, “Hush” or “Shh”, “Psst”, “Ahem”, and Continuous humming, e.g., “hmmm.” Audio samples of each class are found in the respective folders. These audio samples have undergone a thorough cleaning process. The raw samples are published in https://doi.org/10.18419/darus-3897. Initially, we applied the Google WebRTC voice activity detection (VAD) algorithm on the given audio files to remove noise or silence from the collected voice signals. The intensity was set to "2", which could be a value between "1" and "3". However, because of variations in the data, some files required additional manual cleaning. These outliers, characterized by sharp click sounds (such as those occurring at the end of recordings), were addressed. The samples are recorded through a dedicated website for data collection that defines the purpose and type of voice data by providing example recordings to participants as well as the expressions’ written equivalent, e.g., “Uh-huh”. Audio recordings were automatically saved in the .wav format and kept anonymous, with a sampling rate of 48 kHz and a bit depth of 32 bits. For more info, please check the paper or feel free to contact the authors for any inquiries.

  9. S

    Speech and Audio Data Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Speech and Audio Data Report [Dataset]. https://www.marketresearchforecast.com/reports/speech-and-audio-data-28840
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 7, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global speech and audio data market is experiencing robust growth, driven by the increasing adoption of voice assistants, the proliferation of smart devices, and the expanding use of speech analytics in various sectors. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033, reaching an estimated $50 billion by 2033. Key drivers include advancements in artificial intelligence (AI), particularly in natural language processing (NLP) and machine learning (ML), which are enhancing the accuracy and efficiency of speech recognition and analysis. Furthermore, the growing demand for personalized user experiences, coupled with the rise of multilingual applications, is fueling market expansion. The market is segmented by language (Chinese Mandarin, English, Spanish, French, and Others) and application (Commercial Use and Academic Use). Commercial applications, including customer service, market research, and healthcare, currently dominate, but the academic sector is showing significant growth potential as research into speech technology advances. Geographic distribution shows North America and Europe currently holding the largest market shares, but the Asia-Pacific region is expected to experience the fastest growth in the coming years, fueled by increasing smartphone penetration and digitalization in emerging economies like India and China. Restraints include data privacy concerns, the need for high-quality data collection, and the challenges associated with handling diverse accents and dialects. The competitive landscape is characterized by a mix of large technology companies like Google, Amazon, and Microsoft, and specialized speech technology providers such as Nuance and VoiceBase. These companies are engaged in intense R&D to improve the accuracy and performance of speech recognition and synthesis technologies. Strategic partnerships and acquisitions are expected to shape the market further, as companies seek to expand their product portfolios and geographic reach. The ongoing innovation in speech-to-text and text-to-speech technologies, alongside the integration of speech data with other data types (like text and image data), will unlock new applications and further accelerate market growth. The demand for real-time transcription and translation services is also contributing to this upward trend, driving investment in innovative solutions and pushing the boundaries of what’s possible with speech and audio data.

  10. g

    English Deep South Media Audio Dataset

    • gts.ai
    json
    Updated Nov 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2022). English Deep South Media Audio Dataset [Dataset]. https://gts.ai/case-study/english-deep-south-media-audio-dataset-for-data-annotation/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 19, 2022
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The English Deep South Media Audio Dataset project is designed to develop a comprehensive audio dataset focusing on the unique accents and dialects of the English Deep South.

  11. Z

    Axiom voice recognition dataset

    • data.niaid.nih.gov
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Ermini (2024). Axiom voice recognition dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1218978
    Explore at:
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Sara Ermini
    Antonio Rizzo
    Nicola Bettin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The AXIOM Voice Dataset has the main purpose of gathering audio recordings from Italian natural language speakers. This voice data collection intended to obtain audio reconding sample for the training and testing of VIMAR algorithm implemented for the Smart Home scenario for the Axiom board. The final goal was to developing an efficient voice recognition system using machine learning algorithms. A team of UX researchers of the University of Siena collected data for five months and tested the voice recognition system on the AXIOM board [1]. The data acquisition process involved natural Italian speakers who provided their written consent to participate in the research project. The participants were selected in order to maintain a cluster with different characteristics in gender, age, region of origin and background.

  12. F

    Polish (Poland) General Conversation Speech Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Polish (Poland) General Conversation Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-polish-poland
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    Area covered
    Poland
    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the Polish Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of Polish language speech recognition models, with a particular focus on Poland accents and dialects.

    With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the Polish language spoken in Poland.

    Speech Data:

    This training dataset comprises 50 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 70 native Polish speakers from different states/provinces of Poland. This collaborative effort guarantees a balanced representation of Poland accents, dialects, and demographics, reducing biases and promoting inclusivity.

    Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.

    Metadata:

    In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Polish language speech recognition models.

    Transcription:

    This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.

    Our goal is to expedite the deployment of Polish language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.

    Updates and Customization:

    We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.

    If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.

    License:

    This audio dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.

  13. FSD50K

    • zenodo.org
    • opendatalab.com
    • +2more
    bin, zip
    Updated Apr 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons (2022). FSD50K [Dataset]. http://doi.org/10.5281/zenodo.4060432
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Apr 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons
    Description

    FSD50K is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.

    Citation

    If you use the FSD50K dataset, or part of it, please cite our TASLP paper (available from [arXiv] [TASLP]):

    @article{fonseca2022FSD50K,
     title={{FSD50K}: an open dataset of human-labeled sound events},
     author={Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier},
     journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
     volume={30},
     pages={829--852},
     year={2022},
     publisher={IEEE}
    }
    

    Paper update: This paper has been published in TASLP at the beginning of 2022. The accepted camera-ready version includes a number of improvements with respect to the initial submission. The main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. The TASLP-accepted camera-ready version is available from arXiv (in particular, it is v2 in arXiv, displayed by default).

    Data curators

    Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions, at efonseca@google.com.

    ABOUT FSD50K

    Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology [1]. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.

    What follows is a brief summary of FSD50K's most important characteristics. Please have a look at our paper (especially Section 4) to extend the basic information provided here with relevant details for its usage, as well as discussion, limitations, applications and more.

    Basic characteristics:

    • FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio
    • The dataset encompasses 200 sound classes (144 leaf nodes and 56 intermediate nodes) hierarchically organized with a subset of the AudioSet Ontology.
    • The audio content is composed mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more. The vocabulary can be inspected in vocabulary.csv (see Files section below).
    • The acoustic material has been manually labeled by humans following a data labeling process using the Freesound Annotator platform [2].
    • Clips are of variable length from 0.3 to 30s, due to the diversity of the sound classes and the preferences of Freesound users when recording sounds.
    • All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files.
    • Ground truth labels are provided at the clip-level (i.e., weak labels).
    • The dataset poses mainly a large-vocabulary multi-label sound event classification problem, but also allows development and evaluation of a variety of machine listening approaches (see Sec. 4D in our paper).
    • In addition to audio clips and ground truth, additional metadata is made available (including raw annotations, sound predominance ratings, Freesound metadata, and more), allowing a variety of analyses and sound event research tasks (see Files section below).
    • The audio clips are grouped into a development (dev) set and an evaluation (eval) set such that they do not have clips from the same Freesound uploader.

    Dev set:

    • 40,966 audio clips totalling 80.4 hours of audio
    • Avg duration/clip: 7.1s
    • 114,271 smeared labels (i.e., labels propagated in the upwards direction to the root of the ontology)
    • Labels are correct but could be occasionally incomplete
    • A train/validation split is provided (Sec. 3H). If a different split is used, it should be specified for reproducibility and fair comparability of results (see Sec. 5C of our paper)

    Eval set:

    • 10,231 audio clips totalling 27.9 hours of audio
    • Avg duration/clip: 9.8s
    • 38,596 smeared labels
    • Eval set is labeled exhaustively (labels are correct and complete for the considered vocabulary)

    Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal, Human group actions, Human voice, Respiratory sounds, and Domestic sounds, home sounds.

    LICENSE

    All audio clips in FSD50K are released under Creative Commons (CC) licenses. Each clip has its own license as defined by the clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. Specifically:

    The development set consists of 40,966 clips with the following licenses:

    • CC0: 14,959
    • CC-BY: 20,017
    • CC-BY-NC: 4616
    • CC Sampling+: 1374

    The evaluation set consists of 10,231 clips with the following licenses:

    • CC0: 4914
    • CC-BY: 3489
    • CC-BY-NC: 1425
    • CC Sampling+: 403

    For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses. The licenses are specified in the files dev_clips_info_FSD50K.json and eval_clips_info_FSD50K.json.

    In addition, FSD50K as a whole is the result of a curation process and it has an additional license: FSD50K is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSD50K.doc zip file. We note that the choice of one license for the dataset as a whole is not straightforward as it comprises items with different licenses (such as audio clips, annotations, or data split). The choice of a global license in these cases may warrant further investigation (e.g., by someone with a background in copyright law).

    Usage of FSD50K for commercial purposes:

    If you'd like to use FSD50K for commercial purposes, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.

    Also, if you are interested in using FSD50K for machine learning competitions, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.

    FILES

    FSD50K can be downloaded as a series of zip files with the following directory structure:

    root
    │ 
    └───FSD50K.dev_audio/          Audio clips in the dev set
    │ 
    └───FSD50K.eval_audio/         Audio clips in the eval set
    │  
    └───FSD50K.ground_truth/        Files for FSD50K's ground truth
    │  │  
    │  └─── dev.csv               Ground truth for the dev set
    │  │    
    │  └─── eval.csv               Ground truth for the eval set      
    │  │      
    │  └─── vocabulary.csv            List of 200 sound classes in FSD50K 
    │  
    └───FSD50K.metadata/          Files for additional metadata
    │  │      
    │  └─── class_info_FSD50K.json        Metadata about the sound classes
    │  │      
    │  └─── dev_clips_info_FSD50K.json      Metadata about the dev clips
    │  │      
    │  └─── eval_clips_info_FSD50K.json     Metadata about the eval clips
    │  │      
    │  └─── pp_pnp_ratings_FSD50K.json      PP/PNP ratings  
    │  │      
    │  └─── collection/             Files for the *sound collection* format  
    │  
    └───FSD50K.doc/
      │      
      └───README.md               The dataset description file that you are reading
      │      
      └───LICENSE-DATASET            License of the FSD50K dataset as an entity  
    

    Each row (i.e. audio clip) of dev.csv contains the following information:

    • fname: the file name without the .wav extension, e.g., the fname 64760 corresponds to the file 64760.wav in disk. This number is the Freesound id. We always use Freesound ids as filenames.
    • labels: the class labels (i.e., the ground truth). Note these

  14. Sound and Audio Data in Uganda

    • kaggle.com
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2025). Sound and Audio Data in Uganda [Dataset]. https://www.kaggle.com/datasets/techsalerator/sound-and-audio-data-in-uganda/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Techsalerator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Uganda
    Description

    Techsalerator’s Location Sentiment Data for Uganda

    Techsalerator’s Location Sentiment Data for Uganda offers an extensive collection of data that is crucial for businesses, researchers, and technology developers. This dataset provides deep insights into public sentiment across various locations in Uganda, enabling data-driven decision-making for development, marketing, and social research.

    For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.

    Techsalerator’s Location Sentiment Data for Uganda

    Techsalerator’s Location Sentiment Data for Uganda delivers a comprehensive analysis of public sentiment across urban, rural, and industrial locations. This dataset is essential for businesses, government agencies, and researchers looking to understand the sentiment trends in different regions of Uganda.

    Top 5 Key Data Fields

    • Location of Data Capture – Identifies the geographic location where sentiment data was collected, enabling location-specific analysis of public perception.
    • Sentiment Score – Provides a numerical representation of sentiment, with positive, negative, and neutral classifications, supporting sentiment analysis for public opinion research.
    • Demographic Segmentation – Breaks down sentiment by key demographic factors such as age, gender, and occupation to uncover sentiment trends within specific groups.
    • Time of Data Capture – Records the exact time and date of sentiment data collection, helping analyze variations in sentiment over different times of day or during specific events.
    • Sentiment Source – Categorizes data sources such as social media posts, surveys, and customer feedback, to offer insights into the platform-specific sentiment.

    Top 5 Sentiment Trends in Uganda

    • Urban vs. Rural Sentiment – Variations in sentiment between urban centers like Kampala and rural areas, often revealing different priorities and perceptions on topics like infrastructure, education, and healthcare.
    • Political Sentiment – Public sentiment around political events and figures, with insights into political stability, government policies, and public opinion on elections.
    • Economic Sentiment – How Ugandans feel about economic conditions, employment opportunities, inflation, and business growth across different regions.
    • Social Issues Sentiment – Public opinion on social issues such as gender equality, healthcare access, education, and human rights.
    • Technology Adoption Sentiment – Increasing interest in digital technologies, mobile platforms, and internet access, reflecting sentiment on technological advancements and connectivity.

    Top 5 Applications of Location Sentiment Data in Uganda

    • Urban Development and Planning – Helps city planners and government bodies design better urban environments based on public sentiment toward infrastructure, traffic, and public services.
    • Marketing and Consumer Insights – Brands use sentiment data to tailor marketing campaigns and improve customer engagement by understanding regional preferences and concerns.
    • Policy and Governance – Governments and NGOs utilize sentiment data to shape policies that address public concerns and improve governance effectiveness.
    • Social Research – Social researchers can analyze regional disparities in public opinion on issues like education, healthcare, and social justice.
    • Crisis Management and Response – Sentiment data aids in understanding public reaction to crises like health emergencies or natural disasters, helping improve response strategies.

    Accessing Techsalerator’s Location Sentiment Data

    To obtain Techsalerator’s Location Sentiment Data for Uganda, contact info@techsalerator.com with your specific requirements. Techsalerator offers customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.

    Included Data Fields

    • Location of Data Capture
    • Sentiment Score
    • Demographic Segmentation
    • Time of Data Capture
    • Sentiment Source
    • Topic Categories
    • Public Opinion on Government Policies
    • Sentiment on Social Issues
    • Regional Sentiment Trends
    • Contact Information

    For deep insights into public sentiment across Uganda, Techsalerator’s dataset is an invaluable resource for businesses, policymakers, and researchers.

  15. Data Collection And Labelling Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Data Collection And Labelling Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-collection-and-labelling-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 22, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Collection and Labelling Market Outlook



    The global market size for data collection and labelling was estimated at USD 1.3 billion in 2023, with forecasts predicting it will reach approximately USD 7.8 billion by 2032, showcasing a robust CAGR of 20.8% during the forecast period. Several factors are driving this significant growth, including the rising adoption of artificial intelligence (AI) and machine learning (ML) across various industries, the increasing demand for high-quality annotated data, and the proliferation of data-driven decision-making processes.



    One of the primary growth factors in the data collection and labelling market is the rapid advancement and integration of AI and ML technologies across various industry verticals. These technologies require vast amounts of accurately annotated data to train algorithms and improve their accuracy and efficiency. As AI and ML applications become more prevalent in sectors such as healthcare, automotive, and retail, the demand for high-quality labelled data is expected to grow exponentially. Furthermore, the increasing need for automation and the ability to extract valuable insights from large datasets are driving the adoption of data labelling services.



    Another significant factor contributing to the market's growth is the rising focus on enhancing customer experiences and personalisation. Companies are leveraging data collection and labelling to gain deeper insights into customer behaviour, preferences, and trends. This enables them to develop more targeted marketing strategies, improve product recommendations, and deliver personalised services. As businesses strive to stay competitive in a rapidly evolving digital landscape, the demand for accurate and comprehensive data labelling solutions is expected to rise.



    The growing importance of data privacy and security is also playing a crucial role in driving the data collection and labelling market. With the implementation of stringent data protection regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), organisations are increasingly focusing on ensuring the accuracy and integrity of their data. This has led to a greater emphasis on data labelling processes, as they help maintain data quality and compliance with regulatory requirements. Additionally, the rising awareness of the potential risks associated with biased or inaccurate data is further propelling the demand for reliable data labelling services.



    Regionally, North America is expected to dominate the data collection and labelling market during the forecast period. The region's strong technological infrastructure, high adoption rate of AI and ML technologies, and the presence of major market players contribute to its leading position. Additionally, the Asia Pacific region is anticipated to witness significant growth, driven by the increasing investments in AI and ML technologies, the expanding IT and telecommunications sector, and the growing focus on digital transformation in countries such as China, India, and Japan. Europe is also expected to experience steady growth, supported by the rising adoption of AI-driven applications across various industries and the implementation of data protection regulations.



    Data Type Analysis



    The data collection and labelling market can be segmented by data type into text, image/video, and audio. Each type has its unique applications and demands, creating diverse opportunities and challenges within the market. Text data labelling is particularly crucial for natural language processing (NLP) applications, such as chatbots, sentiment analysis, and language translation. The growing adoption of NLP technologies across various industries, including healthcare, finance, and customer service, is driving the demand for high-quality text data labelling services.



    Image and video data labelling is essential for computer vision applications, such as facial recognition, object detection, and autonomous vehicles. The increasing deployment of these technologies in industries such as automotive, retail, and surveillance is fuelling the demand for accurate image and video annotation. Additionally, the growing popularity of augmented reality (AR) and virtual reality (VR) applications is further contributing to the demand for labelled image and video data. The rising need for real-time video analytics and the development of advanced visual search engines are also driving the growth of this segment.



    Audio data labelling is critical for speech recognition and audio analysis appli

  16. Z

    2024-04-08 Total Solar Eclipse ESID#504

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ARISA Lab, L.L.C. (2025). 2024-04-08 Total Solar Eclipse ESID#504 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14889434
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    ARISA Lab, L.L.C.
    Volunteer Scientist
    Severino, MaryKay
    Winter, Henry
    Description

    These are audio recordings taken by an Eclipse Soundscapes (ES) Data Collector during the week of the April 08, 2024 Total Solar Eclipse.

    Data Site location information:

    Latitude: 36.089254

    Longitude: -92.54893

    Type of Eclipse: Total Solar Eclipse

    Eclipse %: 100

    WAV files Time & Date Settings: Set with Automated AudioMoth Time chime

    Included Data:

    Audio files in WAV format with the date and time in UTC within the file name: YYYYMMDD_HHMMSS meaning YearMonthDay_HourMinuteSecondFor example, 20240411_141600.WAV means that this audio file starts on April 11, 2024 at 14:16:00 Coordinated Universal Time (UTC)

    CONFIG Text file: Includes AudioMoth device setting information, such as sample rate in Hertz (Hz), gain, firmware, etc.

    Eclipse Information for this location:

    Eclipse Date: 04/08/2024

    Eclipse Start Time (UTC): 17:35:18

    Totality Start Time (UTC): [N/A if partial eclipse] 18:52:19

    Eclipse Maximum when the most possible amount of the Sun in blocked: 18:54:05

    Totality End Time (UTC): [N/A if partial eclipse] 18:55:51

    Eclipse End Time (UTC): [N/A if partial eclipse] 20:12:14

    Audio Data Collection During Eclipse Week

    ES Data Collectors used AudioMoth devices to record audio data, known as soundscapes, over a 5-day period during the eclipse week: 2 days before the eclipse, the day of the eclipse, and 2 days after. The complete raw audio data collected by the Data Collector at the location mentioned above is provided here. This data may or may not cover the entire requested timeframe due to factors such as availability, technical issues, or other unforeseen circumstances.

    ES ID# Information:

    Each AudioMoth recording device was assigned a unique Eclipse Soundscapes Identification Number (ES ID#). This identifier connects the audio data, submitted via a MicroSD card, with the latitude and longitude information provided by the data collector through an online form. The ES team used the ES ID# to link the audio data with its corresponding location information and then uploaded this raw audio data and location details to Zenodo. This process ensures the anonymity of the ES Data Collectors while allowing them to easily search for and access their audio data on Zenodo.

    TimeStamp Information:

    The ES team and the Data Collectors took care to set the date and time on the AudioMoth recording devices using an AudioMoth time chime before deployment, ensuring that the recordings would have an automatic timestamp. However, participants also manually noted the date and start time as a backup in case the time chime setup failed. The notes above indicate whether the WAV audio files for this site were timestamped manually or with the automated AudioMoth time chime.

    Common Timestamp Error:

    Some AudioMoth devices experienced a malfunction where the timestamp on audio files reverted to a date in 1970 or before, even after initially recording correctly. Despite this issue, the affected data was still included in this ES site’s collected raw audio dataset.

    Latitude & Longitude Information:

    The latitude and longitude for each site was taken manually by data collectors and submitted to the ES team, either via a web form or on paper. It is shared in Decimal Degrees format.

    General Project Information:

    The Eclipse Soundscapes Project is a NASA Volunteer Science project funded by NASA Science Activation that is studying how eclipses affect life on Earth during the October 14, 2023 annular solar eclipse and the April 8, 2024 total solar eclipse. Eclipse Soundscapes revisits an eclipse study from almost 100 years ago that showed that animals and insects are affected by solar eclipses! Like this study from 100 years ago, ES asked for the public's help. ES uses modern technology to continue to study how solar eclipses affect life on Earth! You can learn more at www.EclipseSoundscapes.org.

    Eclipse Soundscapes is an enterprise of ARISA Lab, LLC and is supported by NASA award No. 80NSSC21M0008.

    Eclipse Data Version Definitions

    {1st digit = year, 2nd digit = Eclipse type (1=Total Solar Eclipse, 9=Annular Solar Eclipse, 0=Partial Solar Eclipse), 3rd digit is unused and in place for future use}

    2023.9.0 = Week of October 14, 2023 Annular Eclipse Audio Data, Path of Annularity (Annular Eclipse)

    2023.0.0 = Week of October 14, 2023 Annular Eclipse Audio Data, OFF the Path of Annularity (Partial Eclipse)

    2024.1.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data, Path of Totality (Total Solar Eclipse)

    2024.0.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data , OFF the Path of Totality (Partial Solar Eclipse)

    *Please note that this dataset's version number is listed below.

    Individual Site Citation: APA Citation (7th edition)

    ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientist. (2025). 2024 solar eclipse soundscapes audio data [Audio dataset, ES ID# 504]. Zenodo.{Insert DOI}Collected by volunteer scientists as part of the Eclipse Soundscapes Project.This project is supported by NASA award No. 80NSSC21M0008.

    Eclipse Community Citation

    ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientists. 2023 and 2024 solar eclipse soundscapes audio data [Collection of audio datasets]. Eclipse Soundscapes Community, Zenodo. https://zenodo.org/communities/eclipsesoundscapes/Collected by volunteer scientists as part of the Eclipse Soundscapes Project.This project is supported by NASA award No. 80NSSC21M0008.

  17. Z

    2024-04-08 Total Solar Eclipse ESID#001

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ARISA Lab, L.L.C. (2025). 2024-04-08 Total Solar Eclipse ESID#001 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14888070
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    ARISA Lab, L.L.C.
    Volunteer Scientist
    Severino, MaryKay
    Winter, Henry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are audio recordings taken by an Eclipse Soundscapes (ES) Data Collector during the week of the April 08, 2024 Total Solar Eclipse.

    Data Site location information:

    Latitude: 34.53804

    Longitude: -93.03621

    Type of Eclipse: Total Solar Eclipse

    Eclipse %: 100

    WAV files Time & Date Settings: Set with Automated AudioMoth Time chime

    Included Data:

    Audio files in WAV format with the date and time in UTC within the file name: YYYYMMDD_HHMMSS meaning YearMonthDay_HourMinuteSecondFor example, 20240411_141600.WAV means that this audio file starts on April 11, 2024 at 14:16:00 Coordinated Universal Time (UTC)

    CONFIG Text file: Includes AudioMoth device setting information, such as sample rate in Hertz (Hz), gain, firmware, etc.

    Eclipse Information for this location:

    Eclipse Date: 04/08/2024

    Eclipse Start Time (UTC): 17:31:56

    Totality Start Time (UTC): [N/A if partial eclipse] 18:49:24

    Eclipse Maximum when the most possible amount of the Sun in blocked: 18:51:15

    Totality End Time (UTC): [N/A if partial eclipse] 18:53:05

    Eclipse End Time (UTC): [N/A if partial eclipse] 20:10:10

    Audio Data Collection During Eclipse Week

    ES Data Collectors used AudioMoth devices to record audio data, known as soundscapes, over a 5-day period during the eclipse week: 2 days before the eclipse, the day of the eclipse, and 2 days after. The complete raw audio data collected by the Data Collector at the location mentioned above is provided here. This data may or may not cover the entire requested timeframe due to factors such as availability, technical issues, or other unforeseen circumstances.

    ES ID# Information:

    Each AudioMoth recording device was assigned a unique Eclipse Soundscapes Identification Number (ES ID#). This identifier connects the audio data, submitted via a MicroSD card, with the latitude and longitude information provided by the data collector through an online form. The ES team used the ES ID# to link the audio data with its corresponding location information and then uploaded this raw audio data and location details to Zenodo. This process ensures the anonymity of the ES Data Collectors while allowing them to easily search for and access their audio data on Zenodo.

    TimeStamp Information:

    The ES team and the Data Collectors took care to set the date and time on the AudioMoth recording devices using an AudioMoth time chime before deployment, ensuring that the recordings would have an automatic timestamp. However, participants also manually noted the date and start time as a backup in case the time chime setup failed. The notes above indicate whether the WAV audio files for this site were timestamped manually or with the automated AudioMoth time chime.

    Common Timestamp Error:

    Some AudioMoth devices experienced a malfunction where the timestamp on audio files reverted to a date in 1970 or before, even after initially recording correctly. Despite this issue, the affected data was still included in this ES site’s collected raw audio dataset.

    Latitude & Longitude Information:

    The latitude and longitude for each site was taken manually by data collectors and submitted to the ES team, either via a web form or on paper. It is shared in Decimal Degrees format.

    General Project Information:

    The Eclipse Soundscapes Project is a NASA Volunteer Science project funded by NASA Science Activation that is studying how eclipses affect life on Earth during the October 14, 2023 annular solar eclipse and the April 8, 2024 total solar eclipse. Eclipse Soundscapes revisits an eclipse study from almost 100 years ago that showed that animals and insects are affected by solar eclipses! Like this study from 100 years ago, ES asked for the public's help. ES uses modern technology to continue to study how solar eclipses affect life on Earth! You can learn more at www.EclipseSoundscapes.org.

    Eclipse Soundscapes is an enterprise of ARISA Lab, LLC and is supported by NASA award No. 80NSSC21M0008.

    Eclipse Data Version Definitions

    {1st digit = year, 2nd digit = Eclipse type (1=Total Solar Eclipse, 9=Annular Solar Eclipse, 0=Partial Solar Eclipse), 3rd digit is unused and in place for future use}

    2023.9.0 = Week of October 14, 2023 Annular Eclipse Audio Data, Path of Annularity (Annular Eclipse)

    2023.0.0 = Week of October 14, 2023 Annular Eclipse Audio Data, OFF the Path of Annularity (Partial Eclipse)

    2024.1.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data, Path of Totality (Total Solar Eclipse)

    2024.0.0 = Week of April 8, 2024 Total Solar Eclipse Audio Data , OFF the Path of Totality (Partial Solar Eclipse)

    *Please note that this dataset's version number is listed below.

    Individual Site Citation: APA Citation (7th edition)

    ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientist. (2025). 2024 solar eclipse soundscapes audio data [Audio dataset, ES ID# 001]. Zenodo.{Insert DOI}. Collected by volunteer scientists as part of the Eclipse Soundscapes Project. This project is supported by NASA award No. 80NSSC21M0008.

    Eclipse Community Citation

    ARISA Lab, L.L.C., Winter, H., Severino, M., & Volunteer Scientists. 2023 and 2024 solar eclipse soundscapes audio data [Collection of audio datasets]. Eclipse Soundscapes Community, Zenodo. https://zenodo.org/communities/eclipsesoundscapes/. Collected by volunteer scientists as part of the Eclipse Soundscapes Project. This project is supported by NASA award No. 80NSSC21M0008.

  18. Sound and Audio Data in Palestine State

    • kaggle.com
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2025). Sound and Audio Data in Palestine State [Dataset]. https://www.kaggle.com/datasets/techsalerator/sound-and-audio-data-in-palestine-state/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Techsalerator
    Description

    Techsalerator’s Location Sentiment Data for Palestine State

    Techsalerator’s Location Sentiment Data for Palestine State offers a detailed collection of insights vital for businesses, researchers, and technology developers. This dataset provides in-depth information about the emotional sentiment across different regions, capturing the mood and opinions of people in various environments within Palestine State.

    For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.

    Techsalerator’s Location Sentiment Data for Palestine State

    Techsalerator’s Location Sentiment Data for Palestine State delivers a thorough analysis of sentiment across urban, rural, and industrial locations. This dataset is crucial for AI development, social studies, marketing strategies, and telecommunications.

    Top 5 Key Data Fields

    • Geographic Location – Identifies the exact area where sentiment was captured, helping analyze regional sentiment variations across Palestine State.
    • Sentiment Score – Provides a quantitative measure of positive, neutral, or negative sentiment, allowing businesses to understand public opinion in real-time.
    • Demographic Insights – Breaks down sentiment data by age, gender, and social status, offering a more nuanced understanding of public sentiment.
    • Time of Sentiment Capture – Records the exact date and time of sentiment data collection, allowing for the analysis of fluctuations in sentiment across different times of day.
    • Social Media Mentions – Tracks the frequency of positive, neutral, or negative mentions on social media platforms, contributing to a real-time sentiment analysis.

    Top 5 Sentiment Trends in Palestine State

    • Urban Sentiment Shifts – Increased urbanization in cities like Ramallah and Gaza leads to fluctuating sentiment based on political, economic, and social factors.
    • Political Influence on Sentiment – Events such as elections, protests, and policy changes significantly impact sentiment, particularly in high-stakes periods.
    • Cultural and Social Movements – Social and cultural changes, such as youth activism, impact sentiment data, highlighting generational shifts in Palestine.
    • Economic Sentiment – Economic conditions and development projects drive positive or negative sentiment, influencing both consumer behavior and public opinion.
    • Community Well-being – Sentiment data is being used to track public opinion on health and safety issues, particularly in response to local crises or humanitarian efforts.

    Top 5 Applications of Location Sentiment Data in Palestine State

    • Marketing and Branding – Businesses use sentiment data to fine-tune their advertising strategies, ensuring messaging resonates positively with local audiences.
    • AI and Machine Learning – Enhancing sentiment analysis systems with localized data for better accuracy in understanding emotions, opinions, and social behavior.
    • Political Campaigning – Political candidates and parties use sentiment data to track public opinion and adjust campaign strategies accordingly.
    • Social Impact Studies – Researchers analyze sentiment trends to understand how societal issues, such as unemployment or education, affect the overall mood of the population.
    • Urban Planning and Development – Sentiment data helps urban planners create environments that align with the emotional needs of the population.

    Accessing Techsalerator’s Location Sentiment Data

    To obtain Techsalerator’s Location Sentiment Data for Palestine State, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.

    Included Data Fields

    • Geographic Location
    • Sentiment Score
    • Demographic Insights
    • Time of Sentiment Capture
    • Social Media Mentions
    • Sentiment Analysis by Region
    • Political Sentiment Trends
    • Economic Impact on Sentiment
    • Cultural Sentiment Insights
    • Contact Information

    For in-depth insights into sentiment trends and regional opinions in Palestine State, Techsalerator’s dataset is an invaluable resource for researchers, policymakers, marketers, and urban developers.

  19. Sound and Audio Data in Vanuatu

    • kaggle.com
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2025). Sound and Audio Data in Vanuatu [Dataset]. https://www.kaggle.com/datasets/techsalerator/sound-and-audio-data-in-vanuatu/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Techsalerator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Vanuatu
    Description

    Techsalerator’s Location Sentiment Data for Vanuatu

    Techsalerator’s Location Sentiment Data for Vanuatu provides a detailed collection of data, offering crucial insights for businesses, researchers, and technology developers. This dataset delivers a comprehensive analysis of public sentiment and environmental conditions across different regions of Vanuatu, helping to understand local opinions, behaviors, and perceptions.

    For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.

    Techsalerator’s Location Sentiment Data for Vanuatu

    Techsalerator’s Location Sentiment Data for Vanuatu offers an in-depth analysis of public sentiment across urban, rural, and remote locations. This data is essential for market research, tourism development, social studies, and governmental decision-making.

    Top 5 Key Data Fields

    • Location of Sentiment Capture – Identifies the geographic area where sentiment data is collected, helping analyze regional variations in opinions and attitudes.
    • Sentiment Analysis – Provides an analysis of positive, negative, and neutral sentiments expressed by individuals in various locations, offering insights into local moods and concerns.
    • Sentiment by Demographics – Breaks down sentiment data by key demographics such as age, gender, and socio-economic status, helping tailor messages and services.
    • Time of Sentiment Capture – Records the time and date of sentiment data collection, allowing analysis of shifts in sentiment over time, such as during holidays or national events.
    • Event Impact Analysis – Identifies the effect of local events (e.g., natural disasters, festivals) on sentiment, offering a view into public reactions to significant occurrences.

    Top 5 Sentiment Trends in Vanuatu

    • Tourism Sentiment – A rising interest in sustainable tourism has led to positive sentiment toward eco-tourism initiatives, which influences local businesses and government policies.
    • Disaster Preparedness – The aftermath of cyclones and natural disasters has prompted concerns about preparedness and recovery, with sentiment focusing on improving resilience.
    • Cultural Pride and Heritage – There is a strong sentiment of pride in Vanuatu’s unique cultural heritage, especially among younger generations.
    • Economic Growth and Development – Public sentiment regarding economic opportunities is positive, especially in urban areas, though rural areas show concerns about access to resources.
    • Environmental Concerns – Climate change and environmental degradation are emerging as significant concerns, with strong calls for sustainable development and conservation efforts.

    Top 5 Applications of Location Sentiment Data in Vanuatu

    • Market Research and Consumer Behavior – Businesses can use sentiment data to understand local preferences and adjust products or services accordingly.
    • Tourism Industry Development – Sentiment data helps shape tourism strategies by identifying the experiences and services visitors value most.
    • Government Policy and Social Programs – Policymakers can use sentiment trends to design programs that address public concerns, such as environmental protection and economic growth.
    • Event Planning and Management – Event organizers can analyze sentiment before, during, and after events to gauge public interest and improve future planning.
    • Social Media and Public Relations – Sentiment data helps brands and public relations professionals track public opinion and manage their reputation across different regions.

    Accessing Techsalerator’s Location Sentiment Data

    To obtain Techsalerator’s Location Sentiment Data for Vanuatu, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.

    Included Data Fields

    • Location of Sentiment Capture
    • Sentiment Analysis (Positive, Negative, Neutral)
    • Sentiment by Demographics (Age, Gender, Socio-Economic Status)
    • Time of Sentiment Capture
    • Event Impact Analysis
    • Public Sentiment toward Key Industries (Tourism, Agriculture, Technology, etc.)
    • Regional Sentiment Trends
    • Sentiment toward Government Policies
    • Sentiment by Language and Cultural Groups
    • Impact of Natural Disasters on Sentiment

    Techsalerator’s dataset is an invaluable resource for businesses, governments, and researchers seeking to understand public sentiment in Vanuatu. It provides actionable insights for decision-making, policy development, and market strategies.

  20. BAF: an audio fingerprinting dataset for broadcast monitoring

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillem Cortès; Guillem Cortès; Alex Ciurana; Alex Ciurana; Emilio Molina; Emilio Molina; Marius Miron; Marius Miron; Owen Meyers; Owen Meyers; Joren Six; Joren Six; Xavier Serra; Xavier Serra (2024). BAF: an audio fingerprinting dataset for broadcast monitoring [Dataset]. http://doi.org/10.5281/zenodo.6868083
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guillem Cortès; Guillem Cortès; Alex Ciurana; Alex Ciurana; Emilio Molina; Emilio Molina; Marius Miron; Marius Miron; Owen Meyers; Owen Meyers; Joren Six; Joren Six; Xavier Serra; Xavier Serra
    Description

    Overview

    Broadcast Audio Fingerprinting dataset is an open, available upon request, annotated dataset for the task of music monitoring in broadcast. It contains 2,000 tracks from Epidemic Sound's private catalogue as reference tracks that represent 74 hours. As queries, it contains over 57 hours of TV broadcast audio from 23 countries and 203 channels distributed with 3,425 one-min audio excerpts.

    It has been annotated by six annotators in total and each query has been cross-annotated by three of them obtaining high inter-annotator agreement percentages, which validates the annotation methodology and ensures the reliability of the annotations.

    Purpose of the dataset

    This dataset aims to become the standard dataset to evaluate Audio Fingerprinting algorithms since it’s built on real data, without the use of any data-augmentation techniques. It is also the first dataset to address background music fingerprinting, which is a real problem in royalties distribution.

    Dataset use

    This dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis.

    About the data

    All audio files are monophonic, 8kHz, 128kb/s, pcm_s16le encoded in .wav. Annotations mark which tracks sound (either in foreground or background) in each query (if any) and also the specific times where it starts and ends sound in the query.

    Note that there are 88 queries that do not have any matches.

    For more information check the dedicated Github repository: https://github.com/guillemcortes/baf-dataset and the dataset datasheet included in the files.

    Dataset contents

    The dataset is structured following this schema

    baf-dataset/
    ├── baf_datasheet.pdf
    ├── annotations.csv
    ├── changelog.md
    ├── cross_annotations.csv
    ├── queries_info.csv
    ├── queries
    │  ├── query_0001.wav
    │  ├── query_0002.wav
    │  ├── …
    │  └── query_3425.wav
    ├── queries_info.csv
    └── references
      ├── ref_0001.wav
      ├── ref_0002.wav
      ├── …
      └── ref_2000.wav

    There are two folders named queries and references containing the wav files of TV broadcast recordings and the reference tracks, respectively.

    annotations.csv file contains the annotations made by the 6 annotators, giving the following information:

    annotations.csv content summary
    queryreferencequery_startquery_endannotator
    query_0692.wavref_1235.wav0.059.904annotator_6

    cross_annotations.csv contains the resulting annotations after merging the overlapping annotations in annotations.csv file. x_tag has three different values:

    • single: the segment has only been annotated by one annotator.

    • majority: the segment has been annotated by two annotators.

    • unanimity: the segment has been annotated by the three annotators.

    cross_annotations.csv content summary
    queryreferencequery_Startquery_endannotatorsx_tag
    query_0693.wavref_1834.wav37.5338.07['annotator_3']single
    query_0693.wavref_1834.wav18.1837.48['annotator_3', 'annotator_5', 'annotator_3']unanimity
    query_0693.wavref_1834.wav37.4837.53['annotator_5', 'annotator_3']majority

    queries_info.csv contains information about the queries as a citation reference. It contains the country, the channel and the date where the broadcast happened.

    queries_info.csv content summary
    filenamecountrychanneldatetime
    query_0001.wavNorwayDiscovery Channel2021-02-26 14:45:26

    changelog.md contains a curated, chronologically ordered list of notable changes for each version of the dataset.

    baf_datasheet.pdf contains standardized documentation for datasets

    Ownership of the data

    Next, we specify the ownership of all the data included in BAF: Broadcast Audio Fingerprinting dataset. For licensing information, please refer to the “License” section.

    Reference tracks

    The reference tracks are owned by Epidemic Sound AB, which has given a worldwide, revocable, non-exclusive, royalty-free licence to use and reproduce this data collection consisting of 2,000 low-quality monophonic 8kHz downsampled audio recordings.

    Query tracks

    The query tracks come from publicly available TV broadcast emissions so the ownership of each recording belongs to the channel that emitted the content. We publish them under the right of quotation provided by the Berne Convention.

    Annotations

    Guillem Cortès together with Alex Ciurana and Emilio Molina from BMAT Music Licensing S.L. have managed the annotation therefore the annotations belong to BMAT.

    Accessing the dataset

    The dataset is available upon request. Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset. Bear in mind that this information is important for the evaluation of every access request.

    License

    This dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis. Given the different ownership of the elements of the dataset, the dataset is licensed under the following conditions:

    1. User’s access request

    2. Research only, non-commercial purposes

    3. No adaptations nor derivative works

    4. Attribution to Epidemic Sound and the authors as it is indicated in the ”citation” section.

    Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset.

    Acknowledgments

    With the support of Ministerio de Ciencia Innovación y universidades through Retos-Colaboración call, reference: RTC2019-007248-7, and also with the support of the Industrial Doctorates Plan of the Secretariat of Universities and Research of the Department of Business and Knowledge of the Generalitat de Catalunya. Reference: DI46-2020.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nexdata (2023). Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-speech-recognition-data-collection-services-100-nexdata
Organization logo

Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Recognition Data | Machine Learning (ML) Data

Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 28, 2023
Dataset authored and provided by
Nexdata
Area covered
Estonia, Haiti, Cambodia, Brazil, Malaysia, Sri Lanka, United Kingdom, Lithuania, Austria, El Salvador
Description
  1. Overview With extensive experience in speech recognition, Nexdata has resource pool covering more than 50 countries and regions. Our linguist team works closely with clients to assist them with dictionary and text corpus construction, speech quality inspection, linguistics consulting and etc.

  2. Our Capacity -Global Resources: Global resources covering hundreds of languages worldwide

-Compliance: All the Machine Learning (ML) Data are collected with proper authorization -Quality: Multiple rounds of quality inspections ensures high quality data output

-Secure Implementation: NDA is signed to gurantee secure implementation and Machine Learning (ML) Data is destroyed upon delivery.

  1. About Nexdata Nexdata is equipped with professional Machine Learning (ML) Data collection devices, tools and environments, as well as experienced project managers in data collection and quality control, so that we can meet the data collection requirements in various scenarios and types. Please visit us at https://www.nexdata.ai/service/speech-recognition?source=Datarade
Search
Clear search
Close search
Google apps
Main menu