100+ datasets found
  1. Speaker Recognition - CMU ARCTIC

    • kaggle.com
    zip
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Lins (2022). Speaker Recognition - CMU ARCTIC [Dataset]. https://www.kaggle.com/datasets/mrgabrielblins/speaker-recognition-cmu-arctic
    Explore at:
    zip(1354293783 bytes)Available download formats
    Dataset updated
    Nov 21, 2022
    Authors
    Gabriel Lins
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • Can you predict which speaker is talking?
    • Can you predict what they are saying? This dataset makes all of these possible. Perfect for a school project, research project, or resume builder.

    File information

    • train.csv - file containing all the data you need for training, with 4 columns, id (file id), file_path(path to .wav files), speech(transcription of audio file), and speaker (target column)
    • test.csv - file containing all the data you need to test your model (20% of total audio files), it has the same columns as train.csv
    • train/ - Folder with training data, subdivided with Speaker's folders
      • aew/ - Folder containing audio files in .wav format for speaker aew
      • ...
    • test/ - Folder containing audio files for test data.

    Column description

    ColumnDescription
    idfile id (string)
    file_pathfile path to .wav file (string)
    speechtranscription of the audio file (string)
    speakerspeaker name, use this as the target variable if you are doing audio classification (string)

    More Details

    The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US-English single-speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.

    The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databases include US English male (bdl) and female (slt) speakers (both experienced voice talent) as well as other accented speakers.

    The 1132 sentence prompt list is available from cmuarctic.data

    The distributions include 16KHz waveform and simultaneous EGG signals. Full phonetically labeling was performed by the CMU Sphinx using the FestVox based labeling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labeling, etc.

    Acknowledgements

    This work was partially supported by the U.S. National Science Foundation under Grant No. 0219687, "ITR/CIS Evaluation and Personalization of Synthetic Voices". Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

  2. Spanish Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
    Explore at:
    zip(93217 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Speech Dataset for recognition task

    Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

    The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

    This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  3. h

    british-english-speech-recognition-dataset

    • huggingface.co
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). british-english-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/british-english-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    British English Speech Dataset for recognition task

    Dataset comprises 200 hours of high-quality audio recordings featuring 310 speakers, achieving an impressive 95% Sentence Accuracy Rate. This extensive collection of speech data is designed for NLP tasks such as speech recognition, dialogue systems, and language understanding. By utilizing this dataset, developers and researchers can advance their work in automatic speech recognition and improve recognition systems. - Get the data… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/british-english-speech-recognition-dataset.

  4. h

    speaker-recognition-american-rhetoric

    • huggingface.co
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar W (2023). speaker-recognition-american-rhetoric [Dataset]. https://huggingface.co/datasets/owahltinez/speaker-recognition-american-rhetoric
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2023
    Authors
    Oscar W
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    owahltinez/speaker-recognition-american-rhetoric dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Call Center Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Oct 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Axon Labs (2025). Call Center Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/axondata/call-center-speech-dataset
    Explore at:
    zip(12766164 bytes)Available download formats
    Dataset updated
    Oct 14, 2025
    Authors
    Axon Labs
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Multilingual Call Center Speech Recognition Dataset: 10,000 Hours

    Dataset Summary

    10,000 hours of real-world call center speech recordings in 7 languages with transcripts. Train speech recognition, sentiment analysis, and conversation AI models on authentic customer support audio. Covers support, sales, billing, finance, and pharma domains

    Dataset Features

    📊 Scale & Quality

    • 10,000 hours of inbound & outbound calls
    • Real-world field recordings - no synthetic audio
    • With transcripts and concise summaries

    🎙️ Audio Specifications

    • Format: Single-channel (mono) telephone speech
    • Sample rate: 8,000 Hz
    • Non-synthetic source audio

    🌍 Languages (7)

    English, Russian, Polish, French, German, Spanish, Portuguese - Non-English calls include English translation - Additional languages available on request: Swedish, Dutch, Arabic, Japanese, etc.

    🏢 Domains

    Support, Billing/Account, Sales, Finance/Account Management, Pharma - Each call labeled by domain - Speaker roles annotated (Agent/Customer)

    Full version of dataset is availible for commercial usage - leave a request on our website Axonlabs to purchase the dataset 💰

    Purpose and Usage Scenarios

    • Automatic Speech Recognition, punctuation restoration, and speaker diarization on telephone speech
    • Intent detection, topic classification, and sentiment analysis from customer-service dialogs
    • Post-call concise summaries for QA/quality monitoring and CRM automation
    • Cross-lingual pipelines (original → English) and multilingual support models
  6. u

    Slovenian Speech Recognition Dataset

    • unidata.pro
    mp3, wav
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Slovenian Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/slovenian-speech-recognition/
    Explore at:
    mp3, wavAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Explore our Slovenian Speech Dataset with 10+ hours of clean phone dialogues in MP3/WAV, fully annotated for ASR and language models

  7. u

    Arabic Speech Recognition Dataset

    • unidata.pro
    m4a, mp3, wav, aac
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Arabic Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/arabic-speech-recognition/
    Explore at:
    m4a, mp3, wav, aacAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Discover our Arabic Speech Dataset with 10+ hours of UAE dialogues in M4A/MP3/WAV/AAC. Clean, annotated audio for ASR training

  8. Speaker Recognition Dataset

    • kaggle.com
    zip
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avishkar_001 (2024). Speaker Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/avishkar001/speaker-recognition-dataset
    Explore at:
    zip(317725129 bytes)Available download formats
    Dataset updated
    Jun 26, 2024
    Authors
    Avishkar_001
    Description

    Dataset

    This dataset was created by Avishkar_001

    Contents

  9. French Speech Recognition Dataset

    • kaggle.com
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). French Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/french-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    French Speech Dataset for recognition task

    Dataset comprises 547 hours of telephone dialogues in French, collected from 964 native speakers across various topics and domains, with an impressive 98% Word Accuracy Rate. It is designed for research in speech recognition, focusing on various recognition models, primarily aimed at meeting the requirements for automatic speech recognition (ASR) systems.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in natural language processing (NLP), speech recognition, and machine learning technologies. - Get the data

    The dataset includes high-quality audio recordings with accurate transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fb7af35fb0b3dabe083683bebd27fc5e5%2Fweweewew.PNG?generation=1739885543448162&alt=media" alt="">

    • Audio files: High-quality recordings in WAV format
    • Text transcriptions: Accurate and detailed transcripts for each audio segment
    • Speaker information: Metadata on native speakers, including gender and etc
    • Topics: Diverse domains such as general conversations, business and etc

    The native speakers and various topics and domains covered in the dataset make it an ideal resource for research community, allowing researchers to study spoken languages, dialects, and language patterns.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  10. u

    Italian Speech Recognition Dataset

    • unidata.pro
    a-law/u-law, pcm
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Italian Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/italian-speech-recognition-dataset/
    Explore at:
    a-law/u-law, pcmAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Unidata’s Italian Speech Recognition dataset refines AI models for better speech-to-text conversion and language comprehension

  11. h

    russian-speech-recognition-dataset

    • huggingface.co
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). russian-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/russian-speech-recognition-dataset
    Explore at:
    Dataset updated
    Sep 29, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Russian Telephone Dialogues Dataset - 338 Hours

    The Russian speech dataset includes 338 hours of telephone dialogues in Russian from 460 native speakers, offering high-quality audio recordings with detailed annotations (text, speaker ID, gender, age) to support speech recognition systems, natural language processing, and deep learning models for building accurate Russian dialogue and audio datasets. - Get the data

      Dataset characteristics:
    

    Characteristic Data… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/russian-speech-recognition-dataset.

  12. r

    Voxceleb2

    • resodate.org
    • service.tib.eu
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. S. Chung; A. Nagrani; A. Zisserman (2025). Voxceleb2 [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdm94Y2VsZWIy
    Explore at:
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Leibniz Data Manager
    Authors
    J. S. Chung; A. Nagrani; A. Zisserman
    Description

    The Voxceleb2 dataset is a large-scale speaker recognition dataset, containing 2442 hours raw speech from 6112 speakers.

  13. E

    M2VTS Speaker Verification Database

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Jun 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). M2VTS Speaker Verification Database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0021/
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Multi Modal Verification for Teleservices and Security applications project (M2VTS), running under the European ACTS programme, has produced a database designed to facilitate access control using multimodal identification of human faces. This technique improves recognition efficiency by combining individual modalities (i.e. face and voice). Its relative novelty means that new test material had to be created, since no existing database could offer all modalities needed.The M2VTS database comprises 37 different faces, with 5 shots of each being taken at one-week intervals, or when drastic face changes occurred in the mean time. During each shot, subjects were asked to count from 0 to 9 in their native language (generally French), and to move their heads from left to right, both with and without glasses. The data were then used to create three sequences, for voice, motion and "glasses off". The first sequence can be used for speech verification, 2-D dynamic face verification and speech/lips movement correlation, while the second and third provide information on 3-D face recognition, and may also be used to compare other recognition techniques.

  14. S

    Speaker Recognition Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Speaker Recognition Report [Dataset]. https://www.archivemarketresearch.com/reports/speaker-recognition-563625
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The speaker recognition market is booming, projected to reach $15.1 billion by 2033, with a 15% CAGR. This comprehensive analysis explores market drivers, trends, restraints, and key players like Google, Amazon, and Microsoft, offering insights into this rapidly evolving technology.

  15. E

    The "SIVA" Speech Database for Speaker Verification and Identification

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Jun 14, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2005). The "SIVA" Speech Database for Speaker Verification and Identification [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0028/
    Explore at:
    Dataset updated
    Jun 14, 2005
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The Italian speech database SIVA (?Speaker Identification and Verification Archives: SIVA?), is a database comprising more than two thousands calls, collected over the public switched telephone network, and available very soon via ELRA. The SIVA database consists of four speaker categories: male users, female users, male impostors, female impostors. Speakers were contacted via mail before the test, and they were asked to read the information and the instructions provided carefully before making the call. About 500 speakers were recruited using a company specialized in selection of population samples. The others were volunteers contacted by the institute concerned. Speakers access the recording system by calling a toll free number. An automatic answering system guides them through the three sessions that make up a recording. In the first session, a list of 28 words (including digits and some commands) is recorded using a standard enumerated prompt. The second session is a simple unidirectional dialogue (the caller answers prompted questions) where personal information is asked (name, age, etc.). In the third session, the speaker is asked to read a continuous passage of phonetically balanced text that resembles a short curriculum vitae. The signal is a standard 8kHz sampled signal, coded using 8 bits mu-law format. The data collected so far consists of:· MU: male users 18 speakers, 20 repetitions· FU: female users 16 speakers, 26 repetitions· MI: male impostors: 189 speakers, 2 repetitions, and 128 speakers, 1 repetition· FI: female impostors: 213 speakers, 2 repetitions, and 107 speakers, 1 repetition.

  16. t

    Voxceleb2: Deep speaker recognition

    • service.tib.eu
    • resodate.org
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Voxceleb2: Deep speaker recognition [Dataset]. https://service.tib.eu/ldmservice/dataset/voxceleb2--deep-speaker-recognition
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    Voxceleb2: Deep speaker recognition.

  17. m

    USA speaker Speech Dataset in English

    • data.macgence.com
    mp3
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). USA speaker Speech Dataset in English [Dataset]. https://data.macgence.com/dataset/usa-speaker-speech-dataset-in-english
    Explore at:
    mp3Available download formats
    Dataset updated
    Mar 30, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    United States, Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    The audio dataset includes general conversations, featuring English speakers from USA with detailed metadata.

  18. Z

    Axiom voice recognition dataset

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Ermini; Nicola Bettin; Antonio Rizzo (2024). Axiom voice recognition dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_1218978
    Explore at:
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Vimar
    Unisi
    University of Siena
    Authors
    Sara Ermini; Nicola Bettin; Antonio Rizzo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The AXIOM Voice Dataset has the main purpose of gathering audio recordings from Italian natural language speakers. This voice data collection intended to obtain audio reconding sample for the training and testing of VIMAR algorithm implemented for the Smart Home scenario for the Axiom board. The final goal was to developing an efficient voice recognition system using machine learning algorithms. A team of UX researchers of the University of Siena collected data for five months and tested the voice recognition system on the AXIOM board [1]. The data acquisition process involved natural Italian speakers who provided their written consent to participate in the research project. The participants were selected in order to maintain a cluster with different characteristics in gender, age, region of origin and background.

  19. S

    Speaker Identification Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Oct 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Speaker Identification Software Report [Dataset]. https://www.datainsightsmarket.com/reports/speaker-identification-software-1932055
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Oct 13, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Explore the booming Speaker Identification Software market, projected to reach $1.8 billion by 2033. Discover key drivers, application trends in in-car systems and healthcare, and regional growth opportunities.

  20. Speaker Recognition Dataset

    • kaggle.com
    zip
    Updated Nov 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishabh Dhawan (2023). Speaker Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/rishabh23002/speaker-recognition-dataset
    Explore at:
    zip(98126599 bytes)Available download formats
    Dataset updated
    Nov 25, 2023
    Authors
    Rishabh Dhawan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Rishabh Dhawan

    Released under Apache 2.0

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gabriel Lins (2022). Speaker Recognition - CMU ARCTIC [Dataset]. https://www.kaggle.com/datasets/mrgabrielblins/speaker-recognition-cmu-arctic
Organization logo

Speaker Recognition - CMU ARCTIC

try to classify correctly the speaker given an audio file!

Explore at:
zip(1354293783 bytes)Available download formats
Dataset updated
Nov 21, 2022
Authors
Gabriel Lins
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description
  • Can you predict which speaker is talking?
  • Can you predict what they are saying? This dataset makes all of these possible. Perfect for a school project, research project, or resume builder.

File information

  • train.csv - file containing all the data you need for training, with 4 columns, id (file id), file_path(path to .wav files), speech(transcription of audio file), and speaker (target column)
  • test.csv - file containing all the data you need to test your model (20% of total audio files), it has the same columns as train.csv
  • train/ - Folder with training data, subdivided with Speaker's folders
    • aew/ - Folder containing audio files in .wav format for speaker aew
    • ...
  • test/ - Folder containing audio files for test data.

Column description

ColumnDescription
idfile id (string)
file_pathfile path to .wav file (string)
speechtranscription of the audio file (string)
speakerspeaker name, use this as the target variable if you are doing audio classification (string)

More Details

The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US-English single-speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.

The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databases include US English male (bdl) and female (slt) speakers (both experienced voice talent) as well as other accented speakers.

The 1132 sentence prompt list is available from cmuarctic.data

The distributions include 16KHz waveform and simultaneous EGG signals. Full phonetically labeling was performed by the CMU Sphinx using the FestVox based labeling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labeling, etc.

Acknowledgements

This work was partially supported by the U.S. National Science Foundation under Grant No. 0219687, "ITR/CIS Evaluation and Personalization of Synthetic Voices". Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Search
Clear search
Close search
Google apps
Main menu