100+ datasets found
  1. p

    VOICED Database

    • physionet.org
    Updated Jun 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Verde; Giovanna Sannino (2018). VOICED Database [Dataset]. http://doi.org/10.13026/C25Q2N
    Explore at:
    Dataset updated
    Jun 7, 2018
    Authors
    Laura Verde; Giovanna Sannino
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This database includes 208 voice samples, from 150 pathological, and 58 healthy voices.

  2. P

    Common Voice Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Jan 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosana Ardila; Megan Branson; Kelly Davis; Michael Henretty; Michael Kohler; Josh Meyer; Reuben Morais; Lindsay Saunders; Francis M. Tyers; Gregor Weber (2023). Common Voice Dataset [Dataset]. https://paperswithcode.com/dataset/common-voice
    Explore at:
    Dataset updated
    Jan 7, 2021
    Authors
    Rosana Ardila; Megan Branson; Kelly Davis; Michael Henretty; Michael Kohler; Josh Meyer; Reuben Morais; Lindsay Saunders; Francis M. Tyers; Gregor Weber
    Description

    Common Voice is an audio dataset that consists of a unique MP3 and corresponding text file. There are 9,283 recorded hours in the dataset. The dataset also includes demographic metadata like age, sex, and accent. The dataset consists of 7,335 validated hours in 60 languages.

  3. h

    chest_falsetto

    • huggingface.co
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CCMUSIC Database (2024). chest_falsetto [Dataset]. https://huggingface.co/datasets/ccmusic-database/chest_falsetto
    Explore at:
    Dataset updated
    Aug 4, 2024
    Dataset authored and provided by
    CCMUSIC Database
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset Card for Chest voice and Falsetto Dataset

    The original dataset, sourced from the Chest Voice and Falsetto Dataset, includes 1,280 monophonic singing audio files in .wav format, performed, recorded, and annotated by students majoring in Vocal Music at the China Conservatory of Music. The chest voice is tagged as "chest" and the falsetto voice as "falsetto." Additionally, the dataset encompasses the Mel spectrogram, Mel frequency cepstral coefficient (MFCC), and spectral… See the full description on the dataset page: https://huggingface.co/datasets/ccmusic-database/chest_falsetto.

  4. F

    Finnish General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Finnish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-finnish-finland
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Finnish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Finnish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Finnish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Finnish speech models that understand and respond to authentic Finnish accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Finnish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Finnish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Finland to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Finnish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Finnish.
    Voice Assistants: Build smart assistants capable of understanding natural Finnish conversations.
    <span

  5. h

    voice-data

    • huggingface.co
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lain (2023). voice-data [Dataset]. https://huggingface.co/datasets/moonling/voice-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2023
    Authors
    lain
    Description

    moonling/voice-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. F

    English (UK) General Conversation Speech Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English (UK) General Conversation Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-uk
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United Kingdom
    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on British accents and dialects.

    With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in United Kingdom.

    Speech Data:

    This training dataset comprises 30 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 40 native English speakers from different states/provinces of United Kingdom. This collaborative effort guarantees a balanced representation of British accents, dialects, and demographics, reducing biases and promoting inclusivity.

    Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.

    Metadata:

    In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.

    Transcription:

    This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.

    Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.

    Updates and Customization:

    We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.

    If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.

    License:

    This audio dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.

  7. common_voice_12_0

    • huggingface.co
    Updated Mar 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mozilla Foundation (2023). common_voice_12_0 [Dataset]. https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0
    Explore at:
    Dataset updated
    Mar 24, 2023
    Dataset authored and provided by
    Mozilla Foundationhttp://mozilla.org/
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for Common Voice Corpus 12.0

      Dataset Summary
    

    The Common Voice dataset consists of a unique MP3 and corresponding text file. Many of the 26119 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help improve the accuracy of speech recognition engines. The dataset currently consists of 17127 validated hours in 104 languages, but more voices and languages are always added. Take a look at the Languages page to… See the full description on the dataset page: https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0.

  8. P

    ESD Dataset

    • paperswithcode.com
    Updated Jun 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kun Zhou; Berrak Sisman; Rui Liu; Haizhou Li (2023). ESD Dataset [Dataset]. https://paperswithcode.com/dataset/esd
    Explore at:
    Dataset updated
    Jun 2, 2021
    Authors
    Kun Zhou; Berrak Sisman; Rui Liu; Haizhou Li
    Description

    ESD is an Emotional Speech Database for voice conversion research. The ESD database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers and covers 5 emotion categories (neutral, happy, angry, sad and surprise). More than 29 hours of speech data were recorded in a controlled acoustic environment. The database is suitable for multi-speaker and cross-lingual emotional voice conversion studies.

  9. F

    Norwegian General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Norwegian General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-norwegian-norway
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Norwegian General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Norwegian speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Norwegian communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Norwegian speech models that understand and respond to authentic Norwegian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Norwegian. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Norwegian speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Norway to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Norwegian speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Norwegian.
    Voice Assistants: Build smart assistants capable of understanding natural Norwegian conversations.

  10. Data and voice mobile traffic worldwide, Q1 2012-Q2 2024

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Data and voice mobile traffic worldwide, Q1 2012-Q2 2024 [Dataset]. https://www.statista.com/statistics/1016182/data-and-voice-mobile-quarterly-traffic-worldwide/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the second quarter of 2024, the mobile data traffic reached almost *** exabytes worldwide, which is an increase of around ** exabytes compared to the same quarter in the previous year. The global mobile voice traffic has remained the same since the first quarter of 2016, with **** exabytes.

  11. F

    Japanese General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Japanese General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-japanese-japan
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Japanese General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Japanese speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Japanese communication.

    Curated by FutureBeeAI, this 40 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Japanese speech models that understand and respond to authentic Japanese accents and dialects.

    Speech Data

    The dataset comprises 40 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Japanese. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 80 verified native Japanese speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Japan to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Japanese speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Japanese.
    Voice Assistants: Build smart assistants capable of understanding natural Japanese conversations.

  12. i

    Data from: Italian Parkinson's Voice and Speech

    • ieee-dataport.org
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giovanni Dimauro (2024). Italian Parkinson's Voice and Speech [Dataset]. https://ieee-dataport.org/open-access/italian-parkinsons-voice-and-speech
    Explore at:
    Dataset updated
    Oct 17, 2024
    Authors
    Giovanni Dimauro
    Description

    I would be grateful if you cite my two following papers:

  13. VocalSet: A Singing Voice Dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julia Wilkins; Prem Seetharaman; Alison Wahl; Bryan Pardo; Julia Wilkins; Prem Seetharaman; Alison Wahl; Bryan Pardo (2020). VocalSet: A Singing Voice Dataset [Dataset]. http://doi.org/10.5281/zenodo.1193957
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julia Wilkins; Prem Seetharaman; Alison Wahl; Bryan Pardo; Julia Wilkins; Prem Seetharaman; Alison Wahl; Bryan Pardo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present VocalSet, a singing voice dataset consisting of 10.1 hours of monophonic recorded audio of professional singers demonstrating both standard and extended vocal techniques on all 5 vowels. Existing singing voice datasets aim to capture a focused subset of singing voice characteristics, and generally consist of just a few singers. VocalSet contains recordings from 20 different singers (9 male, 11 female) and a range of voice types. VocalSet aims to improve the state of existing singing voice datasets and singing voice research by capturing not only a range of vowels, but also a diverse set of voices on many different vocal techniques, sung in contexts of scales, arpeggios, long tones, and excerpts.

    We have included two .rtf files test_singers and train_singers in which you will find a list of the singers we used to train and test the majority of our deep learning models on.

  14. Z

    COALA voice data and transcripts Italian

    • data.niaid.nih.gov
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Curti (2023). COALA voice data and transcripts Italian [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8413134
    Explore at:
    Dataset updated
    Oct 7, 2023
    Dataset provided by
    Stefan Wellsandt
    Samuel Kernan Freire
    Massimo Curti
    Mina Foosherian
    Evangelos Niforatos
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains audio files and transcripts in Italian and related to manufacturing. We collected the scripts during the Horizon Europe RIA COALA (GA 957296, project reference website) from industrial use cases and hired a service provider to generate the related audio files (BIBA - Bremer Institut für Produktion und Logistik GmbH ordered the service). The service provider checked the audio files for quality.

    The service provider recruited crowd workers, and gathered their audio records, informed consent (privacy) and agreement that their records become public domain (Creative Commons 0; https://creativecommons.org/share-your-work/public-domain/cc0/). The service provider declared to follow a Crowd Code of Ethics and a Fair Pay policy.

    The metadata file contains the following information:

    file_name: name of the audio file

    script: script the speaker had to speak

    scriptId: the numeric identifier of the script

    participantId: the numeric identifier of the participant (speaker)

    gender: the gender as indicated by the participant (MALE or FEMALE)

    age: the age in years as indicated by the participant

    age_range: the age range in years (18-30, 31-45, 46+)

    country: the birth country indicated by the participant

    current_country: the country of residence indicated by the participant

    primary_language: the language indicated as primary by the participant

    ever_worked_factory: answer to the question: "Have you ever worked in a factory, manufacturing setting?" (Yes/No)

    years_worked_factory: answer to the question: "If yes, for how many years?" (1-10, 10+)

    background_noise_type: background noise in the audio as indicated by the participant (mild, humming/technical, no noise)

    gdpr_and_ipr_consent: answer to the privacy notice and the ipr transfer to CC-0 (Yes)

    date_signed: date when the participant signed the consent form (US format, MM.DD.YYYY)

  15. V

    Voice Data Service Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jun 29, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global voice data services market is experiencing robust growth, driven by the increasing adoption of voice-enabled technologies across various sectors. The market's expansion is fueled by the surge in demand for accurate and efficient transcription, translation, and analysis of voice data. This demand stems from several key factors, including the proliferation of virtual assistants, smart speakers, and contact center solutions, all reliant on sophisticated voice data processing. Furthermore, advancements in artificial intelligence (AI) and machine learning (ML) are leading to more accurate and cost-effective voice data solutions, further stimulating market growth. We estimate the market size in 2025 to be $5 billion, based on observed growth in related sectors like AI and the increasing adoption of voice technologies. A Compound Annual Growth Rate (CAGR) of 15% is projected for the forecast period (2025-2033), indicating a significant expansion of the market in the coming years. Key market segments include transcription services, translation services, and voice analytics. Leading companies like SpeechOcean, Nexdata, and others are actively shaping market dynamics through technological innovation and strategic partnerships. However, challenges remain, including data privacy concerns and the need for robust data security measures to ensure responsible and ethical use of voice data. The market's future trajectory is strongly linked to advancements in AI and natural language processing (NLP). Continued improvements in speech recognition accuracy, coupled with the development of more sophisticated voice biometric systems, will unlock new opportunities within healthcare, finance, and customer service industries. While data security and privacy remain significant concerns, regulatory developments and technological advancements are addressing these issues. The increasing adoption of cloud-based solutions is also driving efficiency and scalability within the voice data services market, reducing costs and increasing accessibility for businesses of all sizes. The competitive landscape is characterized by both established players and emerging startups, with companies focusing on innovation and differentiation through specialized services and targeted solutions. Geographic expansion, particularly in developing economies with growing digital infrastructure, is expected to significantly contribute to overall market growth.

  16. 11 minutes - Infant Laugh Smartphone speech dataset

    • m.nexdata.ai
    • nexdata.ai
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 11 minutes - Infant Laugh Smartphone speech dataset [Dataset]. https://m.nexdata.ai/datasets/speechrecog/1090
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Content category, Recording device, Recording condition
    Description

    Infant Laugh Smartphone speech dataset, Our dataset was collected Laugh sound of 20 infants and young children aged 0~3 years old. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  17. V

    Voice Data Service Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Voice Data Service Report [Dataset]. https://www.archivemarketresearch.com/reports/voice-data-service-38658
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Feb 20, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The voice data service market is experiencing rapid growth, driven by increasing demand for AI training and voice content review. The market size is expected to reach USD XXX million by 2033, growing at a CAGR of XX% during the forecast period. Key drivers include the proliferation of voice-enabled devices, advancements in natural language processing (NLP), and growing adoption of AI solutions across industries. Voice recognition data service holds the largest market share, accounting for over XX%, followed by voice synthesis data service. The market is expected to be highly competitive, with major players including Speechocean, Nexdata, and Beijing Surfing Technology. The market is segmented by type, application, and region. By type, the market is divided into voice recognition data service, voice synthesis data service, and others. By application, the market is segmented into AI training, voice content review, financial anti-fraud, and others. North America is expected to remain the dominant region, followed by Europe and Asia Pacific. The market in emerging regions such as South America, Middle East & Africa, and Asia Pacific is anticipated to witness significant growth due to increasing adoption of voice data services in these regions. Additionally, the rising popularity of remote work and online education is driving demand for voice data services that facilitate communication and collaboration.

  18. E

    TC-STAR Bilingual Voice-Conversion English Speech Database

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Dec 21, 2010
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2010). TC-STAR Bilingual Voice-Conversion English Speech Database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0312/
    Explore at:
    Dataset updated
    Dec 21, 2010
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    4 hours and 80 minutes of speech as spoken by 2 female speakers and 2 male speakers, covering both mimics and parallel voice conversion data.

  19. n

    55 Hours - English(the United Kingdom) Children Scripted Monologue...

    • m.nexdata.ai
    Updated Jan 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). 55 Hours - English(the United Kingdom) Children Scripted Monologue Microphone speech dataset [Dataset]. https://m.nexdata.ai/datasets/speechrecog/62
    Explore at:
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    nexdata technology inc
    Authors
    Nexdata
    Area covered
    United Kingdom
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    English(the United Kingdom) Children Scripted Monologue Microphone speech dataset, collected from monologue based on given scripts, covering educational materials for children, story books, informal language, numbers, alphabet. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(201 British children recorded in hi-fi microphone), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  20. p

    Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to...

    • physionet.org
    Updated Apr 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yael Bensoussan; Alexandros Sigaras; Anais Rameau; Olivier Elemento; Maria Powell; David Dorr; Philip Payne; Vardit Ravitsky; Jean-Christophe Bélisle-Pipon; Alistair Johnson; Ruth Bahr; Stephanie Watts; Donald Bolser; Jennifer Siu; Jordan Lerner-Ellis; Frank Rudzicz; Micah Boyer; Samantha Salvi Cruz; Yassmeen Abdel-Aty; Toufeeq Ahmed Syed; James Anibal; Stephen Aradi; Ana Sophia Martinez; Shaheen Awan; Steven Bedrick; Isaac Bevers; Rahul Brito; Selina Casalino; John Costello; Iris De Santiago; Enrique Diaz-Ocampo; Mohamed Ebraheem; Ellie Eiseman; Mahmoud Elmahdy; Emily Evangelista; Kenneth Fletcher; Alexander Gelbard; Anna Goldenberg; Karim Hanna; William Hersh; Lochana Jayachandran; Kaley Jenney; Kathy Jenkins; Stacy Jo; Ayush Kalia; Andrea Krussel; Elisa Lapadula; Chloe Loewith; Radhika Mahajan; Vrishni Maharaj; Siyu Miao; Matthew Mifsud; Marian Mikhael; Elijah Moothedan; Yosef Nafii; Tempestt Neal; Karlee Newberry; Evan Ng; Christopher Nickel; Trevor Pharr; Claire Premi-Bortolotto; JM Rahman; Sarah Rohde; Laurie Russell; Suketu Shah; Ahmed Shawkat; Elizabeth Silberholz; Duncan Sutherland; Venkata Swarna Mukhi; Jeffrey Tang; Jamie Toghranegar; Kimberly Vinson; Claire Wilson; Madeleine Zanin; Xijie Zeng; Theresa Zesiewicz; Robin Zhao; Pantelis Zisimopoulos; Satrajit Ghosh (2025). Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information [Dataset]. http://doi.org/10.13026/3xt6-rf05
    Explore at:
    Dataset updated
    Apr 16, 2025
    Authors
    Yael Bensoussan; Alexandros Sigaras; Anais Rameau; Olivier Elemento; Maria Powell; David Dorr; Philip Payne; Vardit Ravitsky; Jean-Christophe Bélisle-Pipon; Alistair Johnson; Ruth Bahr; Stephanie Watts; Donald Bolser; Jennifer Siu; Jordan Lerner-Ellis; Frank Rudzicz; Micah Boyer; Samantha Salvi Cruz; Yassmeen Abdel-Aty; Toufeeq Ahmed Syed; James Anibal; Stephen Aradi; Ana Sophia Martinez; Shaheen Awan; Steven Bedrick; Isaac Bevers; Rahul Brito; Selina Casalino; John Costello; Iris De Santiago; Enrique Diaz-Ocampo; Mohamed Ebraheem; Ellie Eiseman; Mahmoud Elmahdy; Emily Evangelista; Kenneth Fletcher; Alexander Gelbard; Anna Goldenberg; Karim Hanna; William Hersh; Lochana Jayachandran; Kaley Jenney; Kathy Jenkins; Stacy Jo; Ayush Kalia; Andrea Krussel; Elisa Lapadula; Chloe Loewith; Radhika Mahajan; Vrishni Maharaj; Siyu Miao; Matthew Mifsud; Marian Mikhael; Elijah Moothedan; Yosef Nafii; Tempestt Neal; Karlee Newberry; Evan Ng; Christopher Nickel; Trevor Pharr; Claire Premi-Bortolotto; JM Rahman; Sarah Rohde; Laurie Russell; Suketu Shah; Ahmed Shawkat; Elizabeth Silberholz; Duncan Sutherland; Venkata Swarna Mukhi; Jeffrey Tang; Jamie Toghranegar; Kimberly Vinson; Claire Wilson; Madeleine Zanin; Xijie Zeng; Theresa Zesiewicz; Robin Zhao; Pantelis Zisimopoulos; Satrajit Ghosh
    License

    https://physionet.org/about/duas/bridge2ai-voice-registered-access-agreement/https://physionet.org/about/duas/bridge2ai-voice-registered-access-agreement/

    Description

    The human voice contains complex acoustic markers which have been linked to important health conditions including dementia, mood disorders, and cancer. When viewed as a biomarker, voice is a promising characteristic to measure as it is simple to collect, cost-effective, and has broad clinical utility. Recent advances in artificial intelligence have provided techniques to extract previously unknown prognostically useful information from dense data elements such as images. The Bridge2AI-Voice project seeks to create an ethically sourced flagship dataset to enable future research in artificial intelligence and support critical insights into the use of voice as a biomarker of health. Here we present Bridge2AI-Voice, a comprehensive collection of data derived from voice recordings with corresponding clinical information. Bridge2AI-Voice v2.0 contains data for 19,271 recordings collected from 442 participants across five sites in North America. Participants were selected based on known conditions which manifest within the voice waveform including voice disorders, neurological disorders, mood disorders, and respiratory disorders. The release contains data considered low risk, including derivations such as spectrograms but not the original voice recordings. Detailed demographic, clinical, and validated questionnaire data are also made available. Audio recordings are included on a companion release on PhysioNet with the title "Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information (Audio Included)". Please see that project for details to request access.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Laura Verde; Giovanna Sannino (2018). VOICED Database [Dataset]. http://doi.org/10.13026/C25Q2N

VOICED Database

Explore at:
34 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 7, 2018
Authors
Laura Verde; Giovanna Sannino
License

Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically

Description

This database includes 208 voice samples, from 150 pathological, and 58 healthy voices.

Search
Clear search
Close search
Google apps
Main menu