75 datasets found

h
Portuguese-audio-dataset
huggingface.co
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KratosAI (2025). Portuguese-audio-dataset [Dataset]. https://huggingface.co/datasets/Kratos-AI/Portuguese-audio-dataset
Explore at:
Dataset updated
Aug 29, 2025
Dataset authored and provided by
KratosAI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Portuguese Voice Emotion Dataset

*This dataset contains high-quality (“A-grade”) data. It has been carefully curated, cleaned, and verified to ensure accuracy, completeness, and consistency, making it suitable for high-stakes or production-grade model training.

Dataset Summary

This dataset comprises high-quality Portuguese speech recordings designed for training and evaluating Speech Emotion Recognition (SER) models. The dataset contains voice samples expressing four… See the full description on the dataset page: https://huggingface.co/datasets/Kratos-AI/Portuguese-audio-dataset.
F
European Portuguese General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). European Portuguese General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-portuguese-portugal
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Portuguese General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Portuguese speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Portuguese communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Portuguese speech models that understand and respond to authentic Portuguese accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Portuguese. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Portuguese speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Portugal to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Portuguese speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Portuguese.

•
Voice Assistants: Build smart assistants capable of understanding natural Portuguese conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
m
Portuguese speaker Speech Dataset in Brazilian
data.macgence.com
mp3
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). Portuguese speaker Speech Dataset in Brazilian [Dataset]. https://data.macgence.com/dataset/portuguese-speaker-speech-dataset-in-brazilian
Explore at:
mp3Available download formats
Dataset updated
Apr 2, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide, Brazil
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
The audio dataset includes general conversations, featuring Brazilian speakers from Portuguese with detailed metadata.
h
portuguese-speech-recognition-dataset
huggingface.co
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). portuguese-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/portuguese-speech-recognition-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 18, 2025
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Portuguese Speech Dataset for recognition task

Dataset comprises 406 hours of telephone dialogues in Portuguese, collected from 590 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology. By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/portuguese-speech-recognition-dataset.
F
Portuguese (Brazil) Call Center Data for Retail & E-Commerce AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Portuguese (Brazil) Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-portuguese-brazil
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Brazil
Dataset funded by
FutureBeeAI
Description
Introduction
This Brazilian Portuguese Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Portuguese speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native Brazilian Portuguese speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
•Participant Diversity:
•
Speakers: 60 native Brazilian Portuguese speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across Brazil to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
•Inbound Calls:
•Product Inquiries
•Order Cancellations
•Refund & Exchange Requests
•Subscription Queries, and more
•Outbound Calls:
•Order Confirmations
•Upselling & Promotions
•Account Updates
•Loyalty Program Offers
•Customer Verifications, and others
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, cough)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
Usage and Applications
This dataset is ideal for a range of voice AI and NLP applications:
•
Automatic Speech Recognition (ASR): Fine-tune Portuguese speech-to-text systems.
F
Portuguese (Brazil) General Domain Scripted Monologue Speech Data
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Portuguese (Brazil) General Domain Scripted Monologue Speech Data [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/general-scripted-speech-monologues-portuguese-brazil
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Brazil
Dataset funded by
FutureBeeAI
Description
Introduction
The Brazilian Portuguese Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of Portuguese language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic Portuguese speech data.
Speech Data
This dataset features over 6,000 high-quality scripted monologue recordings in Brazilian Portuguese. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.
•Participant Diversity
•
Speakers: 60 native Brazilian Portuguese speakers

•
Regions: Broad regional coverage ensures diverse accents and dialects

•
Demographics: Participants aged 18 to 70, with a 60:40 male-to-female ratio

•Recording Specifications
•
Recording Type: Scripted monologues and prompt-based recordings

•
Audio Duration: 5 to 30 seconds per file

•
Format: WAV, mono channel, 16-bit, 8 kHz & 16 kHz sample rates

•
Environment: Clean, noise-free conditions to ensure clarity and usability

Topic Coverage
The dataset covers a wide variety of general conversation scenarios, including:
•Daily Conversations
•Topic-Specific Discussions
•General Knowledge and Advice
•Idioms and Sayings
Contextual Features
To enhance authenticity, the prompts include:
•
Names: Male and female names specific to different Brazil regions

•
Addresses: Commonly used address formats in daily Brazilian Portuguese speech

•
Dates & Times: References used in general scheduling and time expressions

•
Organization Names: Names of businesses, institutions, and other entities

•
Numbers & Currencies: Mentions of quantities, prices, and monetary values

Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.
Transcription
Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.
•
Content: Exact match to the spoken audio

•
Format: Plain text (.TXT), named identically to the corresponding audio file

•
Quality Control: All transcripts are validated by native Portuguese transcribers

Metadata
Rich metadata is included for detailed filtering and analysis:
•
Speaker Metadata: Unique speaker ID, age, gender, region, and dialect

•
Audio Metadata: Prompt transcript, recording setup, device specs, sample rate, bit depth, and format

Applications & Use Cases
This dataset can power a variety of Portuguese language AI technologies, including:
•
Speech Recognition Training: ASR model development and
g Neutral Speech Male
kaggle.com
Updated Sep 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mediatech Lab (2022). g Neutral Speech Male [Dataset]. https://www.kaggle.com/mediatechlab/gNeutralSpeech
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mediatech Lab
Description
**GLOBO’S DATASET TERMS OF USE **

The present Terms of Use (“Terms”) regulates the license of use that GLOBO COMUNICAÇÃO E PARTICIPAÇÕES S.A., a company organized and existing in accordance with the Brazilian laws, with head offices at Rua Lopes Quintas 303, in the city and State of Rio de Janeiro, enrolled in the Brazilian tax registration number 27.865.757/0001-02 (hereinafter simply referred to as “Globo”), grants to the individual or entity that exercises the rights licensed under these Terms (“You”) for the use of audios referring to the reading of texts published on Jornal Nacional’s page on the “G1” website, owned by Globo (hereinafter referred to as “Contents”), which are stored at this dataset (“Dataset”).

**1. Grant of License of Use **

1.1. The scope of these Terms is a non-exclusive, non-sublicensable authorization, for an undefined term, hereby granted by Globo to You, to use the Contents made available via the Dataset for non-commercial purposes, exclusively for the deployment and promotion of research for development and improvement of technologies, including the elaboration of scientific articles, reports and/or any other type of academic publication. Any other form of use of the Contents stored in the Dataset is prohibited.

1.1.1. The authorization hereby granted is royalty-free, non-exclusive, and restricted to the use of the Contents made available in the Dataset under the terms and conditions mentioned herein. The storage of the Contents, as well as the capture, reproduction, use in any media, or by any other modality, or use in any medium, for commercial purposes or not, without previously obtaining Globo´s express authorization, is expressly prohibited. Thus, any form of use that has not been expressly authorized by Globo is prohibited. It is also expressly forbidden to assemble, alter, manipulate and/or transform the Contents, by any means or process. If the Contents contain Globo's brands or logos, they must be maintained by You, and the inclusion of any type of advertising, brand and/or sponsors, which may be related to the Contents, is prohibited, unless expressly authorized by Globo. Globo does not authorize the dubbing of voices/performances contained in the Content.

1.2. You may not, under any circumstances, grant or allow third parties to exploit, under any justification, whether for commercial purposes or not, in Brazil and/or abroad, the Contents, as well as its extracts, excerpts and parts, and You will be responsible for any use not permitted in this instrument, under penalty of being liable for misuse. You hereby undertake to reimburse Globo for all and any damages that it may suffer if such grant or unauthorized use occurs.

1.3. Globo reserves the right to revoke this authorization, at its sole discretion, without the need for any compensation, if it becomes aware of any non-compliance with the conditions established in these Terms.

1.4. The use of the Contents in VOD (video on demand) and OTT (over the top) services is expressly prohibited. Failure to comply with this item is cause for immediate termination of the license hereby granted, without prejudice to a claim compensation for losses and damages, at Globo’s sole discretion.

1.5. You undertake to use the Dataset and the Contents properly and diligently, exclusively for the purposes specified in these Terms, as well as to refrain from using them for purposes or as a mean of committing unlawful acts, prohibited by law and/or rules of these Terms and/or harmful to the rights and interests of Globo and/or third parties, subject to the provisions of item 1.3.

1.6. Globo reserves the right to, unilaterally, add or remove any functionality and/or Content from the Dataset, expand or reduce its storage capacity or usability, alter its presentation, as well as temporarily restrict or suspend its availability, or even terminate it permanently or temporarily, at any time, at its sole discretion, and without prior notice or consent.

1.7. Globo will use its best efforts to ensure the correct functioning of the Dataset without interference of any kind. However, considering the characteristics of the Internet environment, Globo does not guarantee the availability, infallibility and continuity of the Dataset, nor that it will be useful for performing any activity in particular, for which Globo exempts itself from any liability for direct or indirect damages of any nature that may result from the unavailability, failure and/or alteration in the Dataset.

**2. Intellectual Property **

2.1. Globo declares to be fully responsible for the authorization granted herein.

2.2. You acknowledge that all Contents made available in the Dataset are owned exclusively by Globo.

2.3. The reproduction or use of the Contents available in the Dataset in disagreement with the rules established in these Terms constitute a viol...
E
Portugal English Speech Recognition Corpus (Mobile)
catalogue.elra.info
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2024). Portugal English Speech Recognition Corpus (Mobile) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0228_110/
Explore at:
Dataset updated
Jun 28, 2024
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Area covered
Portugal
Description
This corpus was recorded in a quiet office/home environment over 3 channels and collected from a total of 201 speakers, including 90 males and 111 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as news and daily dialogues. Speech samples are stored as a sequence of 16-bit 16kHz for a total of 113.7 hours of speech per channel.
E
AUDIO Human Voice Pronunciations - Portuguese (Portugal)
catalogue.elra.info
Updated Oct 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2023). AUDIO Human Voice Pronunciations - Portuguese (Portugal) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0490_16/
Explore at:
Dataset updated
Oct 9, 2023
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Area covered
Portugal
Description
Human voice recordings of single-word lemmas and multiword expressions, besides IPA (International Phonetic Alphabet) and alternative scripts (Japanese – Romaji/Kanji/Hiragana; Chinese – Pinyin; Arabic and Hebrew – w/out diacritics), distributed as distinct sets (from ELRA-S0490-01 to ELRA-S0490-21) as follows:•Arabic: 8,119 entries•Catalan: 2,247 entries•Chinese (Simplified): 4,719 entries•Czech: 10,629 entries•Danish: 8,878 entries•Dutch: 12,538 entries•English: 24,663 entries•Greek: 9,725 entries•Hebrew: 9,138 entries•Italian: 16,798 entries•Japanese: 5,161 entries•Korean: 5,671 entries•Norwegian: 11,041 entries•Polish: 8,861 entries•Portuguese (Brazil): 9,250 entries•Portuguese (Portugal): 7,676 entries•Russian: 7,502 entries•Spanish: 2,297 entries•Swedish: 7,534 entries•Thai: 5,173 entries•Turkish: 6,491 entries
F
Audio Visual Speech Dataset: European Portuguese
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Audio Visual Speech Dataset: European Portuguese [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/european-portuguese-visual-speech-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Portuguese Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
Dataset Content
This visual speech dataset contains 1000 videos in Portuguese language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
•Participant Diversity:
•
Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of Portugal.

•
Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

Video Data
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
•Recording Details:
•
File Duration: Average duration of 30 seconds to 3 minutes per video.

•
Formats: Videos are available in MP4 or MOV format.

•
Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.

•
Device: Both the latest Android and iOS devices are used in this collection.

•
Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:

•
Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.

•
Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.

•
Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.

•
Face Orientation: Contains straight face and tilted face angles.

•
Participant Positions: Records participants in both standing and seated positions.

•
Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.

•
Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.

•
Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.

•
Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:

•Happy
•Sad
•Excited
•Angry
•Annoyed
•Normal
•
Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

Metadata
The dataset provides comprehensive metadata for each video recording and participant:
•
F
In-Car Speech Dataset: Portuguese (Portugal)
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). In-Car Speech Dataset: Portuguese (Portugal) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-portuguese
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Portugal
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Portuguese Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.
Speech Data
This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.
•Participant Diversity:
•
Speakers: 50+ native Portuguese speakers from the FutureBeeAI Community.

•
Regions: Ensures a balanced representation of Portugal1 accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•
Recording Nature: Scripted wake word and command type of audio recordings.

•
Duration: Average duration of 5 to 20 seconds per audio recording.

•
Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

Dataset Diversity
Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.
•
Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

•
Different Cars: Data collection was carried out in different types and models of cars.

•Different Types of Voice Commands:
•Navigational Voice Commands
•Mobile Control Voice Commands
•Car Control Voice Commands
•Multimedia & Entertainment Commands
•General, Question Answer, Search Commands
•
Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

•Morning
•Afternoon
•Evening
•
Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

•
Noise Level: Silent, Low Noise, Moderate Noise, High Noise

•
Parking Location: Indoor, Outdoor

•
Car Windows: Open, Closed

•
Car AC: On, Off

•
Car Engine: On, Off

•
Car Movement: Stationary, Moving

Metadata
The dataset provides comprehensive metadata for each audio recording and participant:
•
Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

•
h
pt-br_char
huggingface.co
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gilb (2025). pt-br_char [Dataset]. https://huggingface.co/datasets/firstpixel/pt-br_char
Explore at:
Dataset updated
Jul 6, 2025
Authors
Gilb
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Brazilian Portuguese Merged Speech Dataset (Derived from Common Voice)

This dataset is a preprocessed and merged version of the Mozilla Common Voice dataset for Brazilian Portuguese (pt-BR). It was created by filtering, merging, and normalizing audio clips to improve usability for speech recognition and TTS (Text-to-Speech) training.

📌 Dataset Details

Source: Derived from Common Voice Corpus 20.0 Language: 🇧🇷 Brazilian Portuguese (pt-BR) Format: MP3 (24 kHz, mono… See the full description on the dataset page: https://huggingface.co/datasets/firstpixel/pt-br_char.
F
Portuguese (Brazil) Scripted Monologue Speech Data for Healthcare
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Portuguese (Brazil) Scripted Monologue Speech Data for Healthcare [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/healthcare-scripted-speech-monologues-portuguese-brazil
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Brazil
Dataset funded by
FutureBeeAI
Description
Introduction
Introducing the Brazilian Portuguese Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of Portuguese language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.
Speech Data
This dataset includes over 6,000 high-quality scripted audio prompts recorded in Brazilian Portuguese, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.
•Participant Diversity
•
Speakers: 60 native Brazilian Portuguese speakers.

•
Regional Balance: Participants are sourced from multiple regions across Brazil, reflecting diverse dialects and linguistic traits.

•
Demographics: Includes a mix of male and female participants (60:40 ratio), aged between 18 and 70 years.

•Recording Specifications
•
Nature of Recordings: Scripted monologues based on healthcare-related use cases.

•
Duration: Each clip ranges between 5 to 30 seconds, offering short, context-rich speech samples.

•
Audio Format: WAV files recorded in mono, with 16-bit depth and sample rates of 8 kHz and 16 kHz.

•
Environment: Clean and echo-free spaces ensure clear and noise-free audio capture.

Topic Coverage
The prompts span a broad range of healthcare-specific interactions, such as:
•Patient check-in and follow-up communication
•Appointment booking and cancellation dialogues
•Insurance and regulatory support queries
•Medication, test results, and consultation discussions
•General health tips and wellness advice
•Emergency and urgent care communication
•Technical support for patient portals and apps
•Domain-specific scripted statements and FAQs
Contextual Depth
To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:
•
Names: Gender- and region-appropriate Brazil names

•
Addresses: Varied local address formats spoken naturally

•
Dates & Times: References to appointment dates, times, follow-ups, and schedules

•
Medical Terminology: Common medical procedures, symptoms, and treatment references

•
Numbers & Measurements: Health data like dosages, vitals, and test result values

•
Healthcare Institutions: Names of clinics, hospitals, and diagnostic centers

These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.
Transcription
Every audio recording is accompanied by a verbatim, manually verified transcription.
•
Content: The transcription mirrors the exact scripted prompt recorded by the speaker.

•
Format: Files are delivered in plain text (.TXT) format with consistent naming conventions for seamless integration.

•
h
CORAA-NURC-SP-Audio-Corpus
huggingface.co
Updated Sep 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NILC NLP (2024). CORAA-NURC-SP-Audio-Corpus [Dataset]. https://huggingface.co/datasets/nilc-nlp/CORAA-NURC-SP-Audio-Corpus
Explore at:
Dataset updated
Sep 7, 2024
Dataset authored and provided by
NILC NLP
Description
NURC-SP Corpus

NURC-SP Corpus CORAA ASR is a publicly available dataset for Automatic Speech Recognition (ASR) in the Brazilian Portuguese language containing 239.68 hours of audios ( 239.30 when filtered ) and their respective transcriptions (170k+ segmented audios). The audios were either validated by annotators or transcripted for the first time aiming at the ASR task.

How to Use

The datasets library allows easy loading of the dataset with the load_dataset() function.… See the full description on the dataset page: https://huggingface.co/datasets/nilc-nlp/CORAA-NURC-SP-Audio-Corpus.
A
Data from: Avatar Education Portuguese
abacus.library.ubc.ca
iso, txt
Updated Nov 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (2018). Avatar Education Portuguese [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=c6ed3258f1efb16dda8277d5b372?persistentId=hdl%3A11272.1%2FAB2%2FBSQ4NP&version=&q=&fileTypeGroupFacet=&fileAccess=Restricted&fileSortField=size
Explore at:
iso(125351936), txt(1308)Available download formats
Dataset updated
Nov 15, 2018
Dataset provided by
Abacus Data Network
Time period covered
2018
Area covered
United States, Brazil
Description
Avatar Education Portuguese was developed by the University of Pernambuco and consists of approximately 80 minutes of Brazilian Portuguese microphone speech with phonetic and orthographic transcriptions. The data was developed for Avatar Education, an animated virtual assistant designed to enhance communication and interaction in educational contexts, such as online learning. Data The corpus contains 1,400 utterances (700 male and 700 female) of read and spontaneous speech spoken by two professional speakers. Utterances were transcribed at the word level (without time alignments) and at the phoneme level (with time alignment labels). The audio data was recorded at 16kHz (mono, 16-bit) using Pro Tools recording software and stored in flac compressed wav format. The acoustic environment was controlled for background conditions that occur in application environments.
f
Pause study BP audio files
figshare.com
wav
Updated Oct 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Plinio Barbosa (2022). Pause study BP audio files [Dataset]. http://doi.org/10.6084/m9.figshare.21325275.v1
Explore at:
wavAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21325275.v1
Dataset updated
Oct 13, 2022
Dataset provided by
figshare
Authors
Plinio Barbosa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Audio files of poem declamation coded as such: XYBPAPn or XYBPCAn where X = gender (F=female, M=male), XY = participant (e.g., F1 = first female participant), BP = Brazilian Portuguese. AP= poem of Adélia Prado CA = poem of Alberto Caeiro n = number of poems by AP or CA, where 2 = negative valence and 1 = positive valence.
n
8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...
data.nexdata.ai
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech Recognition Data| Machine Learning (ML) Data [Dataset]. https://data.nexdata.ai/products/nexdata-multilingual-conversational-speech-data-8khz-tele-nexdata
Explore at:
Dataset updated
Aug 3, 2024
Dataset authored and provided by
Nexdata
Area covered
Egypt, Bangladesh, Serbia, Slovenia, Australia, Spain, Taiwan, New Zealand, Saudi Arabia, Austria
Description
Nexdata has off-the-shelf 15,000 hours Machine Learning (ML) Data of 8kHz conversational speech, covering 100+ countries including English, German, French, Spanish, Italian, Portuguese, Korean, Japanese, Hindi, Russia and etc.
f
Pause study EP audio files
figshare.com
wav
Updated Oct 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Plinio Barbosa (2022). Pause study EP audio files [Dataset]. http://doi.org/10.6084/m9.figshare.21325371.v1
Explore at:
wavAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21325371.v1
Dataset updated
Oct 13, 2022
Dataset provided by
figshare
Authors
Plinio Barbosa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Audio files of poem declamation coded as such:

XYEPAPn or XYEPCAn

where X = gender (F=female, M=male),

XY = participant (e.g., F1 = first female participant),

EP = European Portuguese.

AP= poem of Adélia Prado

CA = poem of Alberto Caeiro

n = number of poems by AP or CA, where 2 = negative valence and 1 = positive valence.
E
Fundamental Portuguese Corpus
live.european-language-grid.eu
catalogue.elra.info
audio format
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fundamental Portuguese Corpus [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/2118
Explore at:
audio formatAvailable download formats
License
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
The Fundamental Portuguese Corpus is a corpus of spoken language, collected between 1970 and 1974, composed of 1800 recordings (500 hours) made in Continental Portugal and the Islands. Of these 1800 conversations, a sample was selected and transcribed.
The corpus consists of audio files in .wav format, aligned transcriptions in XML Exmaralda format and transcriptions in plain text. The plain text files also have automatically assigned POS-tag information. The transcriptions of the corpus are also available in html format. The characters have been encoded in UTF-8.
U
Annotated file for: Documentation of Malaccan Portuguese Creole
researchdata.um.edu.my
bin
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefanie Shamila Pillai; Stefanie Shamila Pillai (2023). Annotated file for: Documentation of Malaccan Portuguese Creole [Dataset]. http://doi.org/10.22452/RD/DLBPNI
Explore at:
bin(40591), bin(66884), bin(62005), bin(155150), bin(19883), bin(126932), bin(42199), bin(55044), bin(27828), bin(36288), bin(110959), bin(29254), bin(19976), bin(72168), bin(41704), bin(221894), bin(98170), bin(23206), bin(25896), bin(64004), bin(232936), bin(72267), bin(17292)Available download formats
Unique identifier
https://doi.org/10.22452/RD/DLBPNI
Dataset updated
Oct 18, 2023
Dataset provided by
Universiti Malaya Research Data Repository
Authors
Stefanie Shamila Pillai; Stefanie Shamila Pillai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Malacca
Description
The project will build a corpus of Malaccan Portuguese Creole, which is spoken by about 1000 people in the Portuguese Settlement in Melaka, Malaysia. The purpose of this project is to create a database of video and audio recordings comprising a variety of speaking contexts. The recordings will be paired with time-aligned orthographic transcriptions and annotations. The annotations will allow further linguistic analysis to be carried out while the corpus will serve as a digital resource for the community.

Facebook

Twitter

Click to copy link

Link copied

Cite

KratosAI (2025). Portuguese-audio-dataset [Dataset]. https://huggingface.co/datasets/Kratos-AI/Portuguese-audio-dataset

Portuguese-audio-dataset

Portuguese Voice Emotion Dataset

Kratos-AI/Portuguese-audio-dataset

Explore at:

Dataset updated

Aug 29, 2025

Dataset authored and provided by

KratosAI

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Portuguese Voice Emotion Dataset

*This dataset contains high-quality (“A-grade”) data. It has been carefully curated, cleaned, and verified to ensure accuracy, completeness, and consistency, making it suitable for high-stakes or production-grade model training.

  Dataset Summary

This dataset comprises high-quality Portuguese speech recordings designed for training and evaluating Speech Emotion Recognition (SER) models. The dataset contains voice samples expressing four… See the full description on the dataset page: https://huggingface.co/datasets/Kratos-AI/Portuguese-audio-dataset.

Clear search

Close search

Google apps

Main menu

Portuguese-audio-dataset

European Portuguese General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Portuguese speaker Speech Dataset in Brazilian

portuguese-speech-recognition-dataset

Portuguese (Brazil) Call Center Data for Retail & E-Commerce AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Portuguese (Brazil) General Domain Scripted Monologue Speech Data

Introduction

Speech Data

Topic Coverage

Contextual Features

Transcription

Metadata

Applications & Use Cases

g Neutral Speech Male

Portugal English Speech Recognition Corpus (Mobile)

AUDIO Human Voice Pronunciations - Portuguese (Portugal)

Audio Visual Speech Dataset: European Portuguese

Introduction

Dataset Content

Video Data

Metadata

In-Car Speech Dataset: Portuguese (Portugal)

Introduction

Speech Data

Dataset Diversity

Metadata

pt-br_char

Portuguese (Brazil) Scripted Monologue Speech Data for Healthcare

Introduction

Speech Data

Topic Coverage

Contextual Depth

Transcription

CORAA-NURC-SP-Audio-Corpus

Data from: Avatar Education Portuguese

Pause study BP audio files

8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...

Pause study EP audio files

Fundamental Portuguese Corpus

Annotated file for: Documentation of Malaccan Portuguese Creole

Portuguese-audio-dataset

Portuguese Voice Emotion Dataset

Kratos-AI/Portuguese-audio-dataset