100+ datasets found

F
Audio Visual Speech Dataset: American English
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
Dataset Content
This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
•Participant Diversity:
•
Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.

•
Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

Video Data
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
•Recording Details:
•
File Duration: Average duration of 30 seconds to 3 minutes per video.

•
Formats: Videos are available in MP4 or MOV format.

•
Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.

•
Device: Both the latest Android and iOS devices are used in this collection.

•
Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:

•
Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.

•
Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.

•
Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.

•
Face Orientation: Contains straight face and tilted face angles.

•
Participant Positions: Records participants in both standing and seated positions.

•
Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.

•
Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.

•
Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.

•
Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:

•Happy
•Sad
•Excited
•Angry
•Annoyed
•Normal
•
Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

Metadata
The dataset provides comprehensive metadata for each video recording and participant:
•
d
Customer Service Audio Dataset [Raw Call Recordings, Multi-Industry, U.S.]
datarade.ai
.wav
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WiserBrand.com (2023). Customer Service Audio Dataset [Raw Call Recordings, Multi-Industry, U.S.] [Dataset]. https://datarade.ai/data-products/customer-service-audio-dataset-raw-call-recordings-multi-in-wiserbrand-com
Explore at:
.wavAvailable download formats
Dataset updated
Dec 8, 2023
Dataset provided by
WiserBrand.com
Area covered
United States of America
Description
This dataset contains thousands of authentic audio recordings of customer calls to service teams across key U.S. industries. Captured from inbound support channels, these files reflect natural speech in real service contexts, with varied speaker accents, background noise, and emotion levels. Each recording involves only a customer and a customer service agent, preserving a realistic two-party call structure.

Dataset includes: - Thousands of customer service call recordings (WAV/MP3) - English language, native and accented speech - Real-world acoustic conditions (noise, silence, overlapping speech) - Dataset language: English (other languages on request)

Use this dataset to: - Train speech-to-text engines on real-world, noisy support audio
- Build speaker diarization and audio segmentation models
- Simulate customer-agent voice interactions for LLM fine-tuning
- Test multilingual or accent-robust audio pipelines
- Develop acoustic models for call quality enhancement

This audio-first dataset is ideal for ASR developers, call center AI builders, and speech researchers looking for real-life, labeled customer service calls.
s
US English Singing Audio Dataset
shaip.com
Updated Jun 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2023). US English Singing Audio Dataset [Dataset]. https://www.shaip.com/offerings/speech-data-catalog/us-english-singing-audio-dataset/
Explore at:
Dataset updated
Jun 12, 2023
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Home US English Singing Audio DatasetHigh-Quality US English Singing Audio Dataset for AI & Speech Models Contact Us OverviewTitleUS English Singing Audio DataseDataset TypeSinging AudioDescriptionSinging audio collection & transcriptionAudio categories:…
F
In-Car Speech Dataset: English (US)
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). In-Car Speech Dataset: English (US) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-english-us
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.
Speech Data
This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.
•Participant Diversity:
•
Speakers: 50+ native English speakers from the FutureBeeAI Community.

•
Regions: Ensures a balanced representation of United States of America1 accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•
Recording Nature: Scripted wake word and command type of audio recordings.

•
Duration: Average duration of 5 to 20 seconds per audio recording.

•
Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

Dataset Diversity
Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.
•
Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

•
Different Cars: Data collection was carried out in different types and models of cars.

•Different Types of Voice Commands:
•Navigational Voice Commands
•Mobile Control Voice Commands
•Car Control Voice Commands
•Multimedia & Entertainment Commands
•General, Question Answer, Search Commands
•
Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

•Morning
•Afternoon
•Evening
•
Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

•
Noise Level: Silent, Low Noise, Moderate Noise, High Noise

•
Parking Location: Indoor, Outdoor

•
Car Windows: Open, Closed

•
Car AC: On, Off

•
Car Engine: On, Off

•
Car Movement: Stationary, Moving

Metadata
The dataset provides comprehensive metadata for each audio recording and participant:
•
Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

•
h
Silver-Audio-Dataset
huggingface.co
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silver Avocado (2024). Silver-Audio-Dataset [Dataset]. https://huggingface.co/datasets/SilverAvocado/Silver-Audio-Dataset
Explore at:
Dataset updated
Dec 15, 2024
Dataset authored and provided by
Silver Avocado
Description
Dataset Overview

The dataset is a curated collection of .npy files containing MFCC features extracted from raw audio recordings. It has been specifically designed for training and evaluating machine learning models in the context of real-world emergency sound detection and classification tasks. The dataset captures diverse audio scenarios, making it a robust resource for developing safety-focused AI systems, such as the SilverAssistant project.

Dataset Descriptions… See the full description on the dataset page: https://huggingface.co/datasets/SilverAvocado/Silver-Audio-Dataset.
Industry revenue of “audio and video equipment manufacturing“ in the U.S....
statista.com
Updated Jul 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Industry revenue of “audio and video equipment manufacturing“ in the U.S. 2012-2024 [Dataset]. https://www.statista.com/forecasts/310987/audio-and-video-equipment-manufacturing-revenue-in-the-us
Explore at:
Dataset updated
Jul 8, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2012 - 2017
Area covered
United States
Description
This statistic shows the revenue of the industry “audio and video equipment manufacturing“ in the U.S. from 2012 to 2017, with a forecast to 2024. It is projected that the revenue of audio and video equipment manufacturing in the U.S. will amount to approximately ******* million U.S. Dollars by 2024.
c
North America Audio Distribution Systems Market is Growing at Compound...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research, North America Audio Distribution Systems Market is Growing at Compound Annual Growth Rate of 4.4% from 2023 to 2030. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/north-america-audio-distribution-systems-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
North America, Region
Description
North America Audio Distribution Systems held the major market of more than 40% of the global revenue with a market size of USD XX million in 2023 and will grow at a compound annual growth rate (CAGR) of 4.4% from 2023 to 2030
s
US English Pese Audio Dataset
sm.shaip.com
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2024). US English Pese Audio Dataset [Dataset]. https://sm.shaip.com/offerings/speech-data-catalog/us-english-singing-audio-dataset/
Explore at:
Dataset updated
Dec 6, 2024
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Aiga US Igilisi Pese Leo Fa'amaumauga Tulaga Maualuga US Igilisi Pese Leo mo AI & Tautala Fa'ata'ita'iga Fa'afeso'ota'i Matou Va'aiga Fa'aigoaItulaga US Pese Leo Fa'amatalaga Ituaiga Fa'amatalaga Pese leoFa'amatalagaAoina o leo pese ma fa'aliliu Va'aiga leo:…
c
North America Audio Amplifiers market size will be USD 1478.20 million in...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). North America Audio Amplifiers market size will be USD 1478.20 million in 2024. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/north-america-audio-amplifiers-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jun 13, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Region, North America
Description
North America Audio Amplifiers market size will be USD 1478.20 million in 2024 and will grow at a compound annual growth rate (CAGR) of 5.2% from 2024 to 2031. North America has emerged as a prominent participant, and its sales revenue is estimated to reach USD 2290.6 Million by 2031. This growth is mainly attributed to the region's growing advancement in automotive industry.
F
American English General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple English speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for US English.

•
Voice Assistants: Build smart assistants capable of understanding natural American conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
c
Latin America's Audio Amplifiers market will be USD 184.78 million in 2024...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). Latin America's Audio Amplifiers market will be USD 184.78 million in 2024 and is estimated to grow at a compound annual growth rate (CAGR) of 6.4% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/south-america-audio-amplifiers-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Region, Latin America
Description
Latin America's Audio Amplifiers market will be USD 184.78 million in 2024 and is estimated to grow at a compound annual growth rate (CAGR) of 6.4% from 2024 to 2031. The market is foreseen to reach USD 314.5 million by 2031 due to the increasing demand from hpme appliances sector.
s
I-US English Singing Audio Dataset
zu.shaip.com
Updated Jun 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2023). I-US English Singing Audio Dataset [Dataset]. https://zu.shaip.com/offerings/speech-data-catalog/us-english-singing-audio-dataset/
Explore at:
Dataset updated
Jun 24, 2023
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Ikhaya Isethi Yedatha Yomsindo Yokucula yase-USIkhwalithi ephezulu yase-US Isethi yedatha yomsindo yokucula yase-US ye-AI namamodeli wenkulumo Xhumana nathi UhlolojikeleleIsihlokoIdatha ye-US English Yomsindo WokuculaIdathaIsethiUhlobo LomsindoIncazeloIqoqo lomsindo eliculayo nokulotshiweIzigaba zomsindo:...
F
Expenditures: Audio and Visual Equipment and Services by Race: White, Asian,...
fred.stlouisfed.org
json
Updated Sep 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Expenditures: Audio and Visual Equipment and Services by Race: White, Asian, and All Other Races, Not Including Black or African American [Dataset]. https://fred.stlouisfed.org/series/CXUTVAUDIOLB0902M
Explore at:
jsonAvailable download formats
Dataset updated
Sep 25, 2024
License
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Area covered
United States
Description
Graph and download economic data for Expenditures: Audio and Visual Equipment and Services by Race: White, Asian, and All Other Races, Not Including Black or African American (CXUTVAUDIOLB0902M) from 1984 to 2023 about audio-visual, asian, equipment, white, expenditures, services, and USA.
d
Global Call Center & Conversational Audio Dataset — Multilingual, Validated,...
datarade.ai
.mp3, .wav
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FileMarket (2025). Global Call Center & Conversational Audio Dataset — Multilingual, Validated, with Demographics + Custom Collection Available [Dataset]. https://datarade.ai/data-products/global-call-center-conversational-audio-dataset-multiling-filemarket
Explore at:
.mp3, .wavAvailable download formats
Dataset updated
Jul 21, 2025
Dataset authored and provided by
FileMarket
Area covered
Croatia, Taiwan, Lesotho, New Caledonia, Comoros, Gabon, Burundi, Nigeria, Gibraltar, Namibia
Description
We provide a wide range of off-the-shelf multilingual audio datasets, featuring real-world call center dialogues and general conversational recordings from regions across Africa, Central America, South America, and Asia.

Our datasets include multiple languages, local dialects, and authentic conversational flows — designed for AI training, contact center automation, and conversational AI development. All samples are human-validated and come with complete metadata.

Each Dataset Includes:

Unique Participant ID

Gender (Male/Female)

Country & City of Origin

Speaker Age (18-60 years)

Language (English + Multiple Local Languages)

Audio Length: ~30 minutes per participant

Validation Status: 100% Human-Checked

Why Work With Us: ✅ Large library of ready-to-use multilingual datasets ✅ Authentic call center, customer service, and natural conversation recordings ✅ Global coverage with diverse speaker demographics ✅ Custom data collection service — we can source or record datasets tailored to your language, region, or domain needs

Best For:

Speech Recognition & Multilingual NLP

Voicebots & Contact Center AI Solutions

Dialect & Accent Recognition Training

Conversational AI & Multilingual Assistants

Customer Support & Quality Analytics

Whether you need off-the-shelf datasets or unique, project-specific collections — we’ve got you covered.

http://filemarket.ai
6
Latin America Audio Amplifier Market (2025-2031) | Size & Revenue
6wresearch.com
excel, pdf,ppt,csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
6Wresearch, Latin America Audio Amplifier Market (2025-2031) | Size & Revenue [Dataset]. https://6wresearch.com/industry-report/latin-america-audio-amplifier-market-2021-2027
Explore at:
excel, pdf,ppt,csvAvailable download formats
Dataset authored and provided by
6Wresearch
License
https://www.6wresearch.com/privacy-policyhttps://www.6wresearch.com/privacy-policy
Area covered
United States
Variables measured
By Countries (Brazil, Mexico, Argentina, Rest of Latin America),, By Channel Type (Mono Channel, Two Channel, Four Channel, Six Channel, Others),, By End-User (Consumer Electronics, Automotive, Entertainment) And Competitive Landscape, By Device (Smartphones, Television Sets, Tablets, Desktops & Laptops, Home Audio Systems, Automotive Infotainment Systems, Professional Audio Systems),
Description
Latin America Audio Amplifier Market is expected to grow during 2025-2031
k
North America Audio Codec Market Size, Share & Industry Analysis Report By...
kbvresearch.com
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KBV Research (2025). North America Audio Codec Market Size, Share & Industry Analysis Report By Function, By Component Type, By Application (Smartphones & Tablets, Headphone, Head Sets & Wearable devices, Automobile, Television Sets, Desktop & Laptops, Music & Media Devices & Home Theatres, Gaming Consoles, and Other Application), By Country and Growth Forecast, 2025 - 2032 [Dataset]. https://www.kbvresearch.com/north-america-audio-codec-market/
Explore at:
Dataset updated
Aug 11, 2025
Dataset authored and provided by
KBV Research
License
https://www.kbvresearch.com/privacy-policy/https://www.kbvresearch.com/privacy-policy/
Time period covered
2025 - 2032
Area covered
North America
Description
The North America Audio Codec Market would witness market growth of 5.9% CAGR during the forecast period (2025-2032). The US market dominated the North America Audio Codec Market by Country in 2024, and would continue to be a dominant market till 2032; thereby, achieving a market value of USD 2,361
F
American English Call Center Data for Retail & E-Commerce AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
•Inbound Calls:
•Product Inquiries
•Order Cancellations
•Refund & Exchange Requests
•Subscription Queries, and more
•Outbound Calls:
•Order Confirmations
•Upselling & Promotions
•Account Updates
•Loyalty Program Offers
•Customer Verifications, and others
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, cough)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
Usage and Applications
This dataset is ideal for a range of voice AI and NLP applications:
•
Automatic Speech Recognition (ASR): Fine-tune English speech-to-text systems.

<span
Digital audio ad spend in Latin America 2020-2026
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Digital audio ad spend in Latin America 2020-2026 [Dataset]. https://www.statista.com/statistics/1272904/latin-america-audio-ad-spend/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
LAC, Latin America
Description
In 2020, digital audio (streaming radio and podcast) advertising spending in Latin America’s largest markets stood at **** million U.S. dollars. This figure is expected to nearly quadruple by 2026, reaching an estimated ** million dollars that year.

Digital advertising in Latin America

In line with rising internet adoption rates, digital advertising spending in Latin America has rapidly increased over the past few years. In 2020, digital ad spend in the region amounted to approximately **** billion U.S. dollars, marking a boost of almost ** percent compared to the previous year. This spike was arguably fueled by the outbreak of the coronavirus (COVID-19) pandemic, which spurred online usage like never before. Taking a closer look at the different countries, Brazil expectedly stands out as the leading digital ad market in Latin America, with nearly *** billion U.S. dollars in digital ad investments in 2020.

The digital audio landscape is constantly expanding

The digital audio market is also slowly gaining momentum in Latin America. For example, the average daily time spend listening to online radio in Brazil increased from *** minutes in 2017 to *** minutes in 2020, signaling a mounting interest in digital audio options. In addition to radio, audiences also embrace music streaming content more vividly than ever, as the rising number of Spotify users in Latin America continues to demonstrate. Meanwhile, the region is also rapidly growing its podcast listener base every year. Knowing that Spanish is poised to become the second universal language for podcasting worldwide and podcasts such as “La Corneta” boast more than *** million downloads per week, audio advertisers in Latin America have their work cut out for them.
h
common-voice-english-audio
huggingface.co
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Freddy Boulton (2025). common-voice-english-audio [Dataset]. https://huggingface.co/datasets/freddyaboulton/common-voice-english-audio
Explore at:
Dataset updated
Apr 11, 2025
Authors
Freddy Boulton
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
freddyaboulton/common-voice-english-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
c
South America Audio Distribution Systems Market is Growing at Compound...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). South America Audio Distribution Systems Market is Growing at Compound Annual Growth Rate of 5.6% from 2023 to 2030. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/south-america-audio-distribution-systems-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
South America, Region
Description
South America Audio Distribution Systems market of more than 5% of the global revenue with a market size of USD XX million in 2023 and will grow at a compound annual growth rate (CAGR) of 5.6% from 2023 to 2030.

Facebook

Twitter

Click to copy link

Link copied

Cite

FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset

Audio Visual Speech Dataset: American English

American English Audio Visual Speech Dataset

Explore at:

wavAvailable download formats

Dataset updated

Aug 1, 2022

Dataset provided by

FutureBeeAI

Authors

FutureBee AI

License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Area covered

United States

Dataset funded by

FutureBeeAI

Description

Introduction

Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

Dataset Content

This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

•Participant Diversity:

•

Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.

•

Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.

•

Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

Video Data

While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

•Recording Details:

•

File Duration: Average duration of 30 seconds to 3 minutes per video.

•

Formats: Videos are available in MP4 or MOV format.

•

Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.

•

Device: Both the latest Android and iOS devices are used in this collection.

•

Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:

•

Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.

•

Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.

•

Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.

•

Face Orientation: Contains straight face and tilted face angles.

•

Participant Positions: Records participants in both standing and seated positions.

•

Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.

•

Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.

•

Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.

•

Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:

•Happy

•Sad

•Excited

•Angry

•Annoyed

•Normal

•

Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

Metadata

The dataset provides comprehensive metadata for each video recording and participant:

•

Clear search

Close search

Google apps

Main menu

Audio Visual Speech Dataset: American English

Introduction

Dataset Content

Video Data

Metadata

Customer Service Audio Dataset [Raw Call Recordings, Multi-Industry, U.S.]

US English Singing Audio Dataset

In-Car Speech Dataset: English (US)

Introduction

Speech Data

Dataset Diversity

Metadata

Silver-Audio-Dataset

Industry revenue of “audio and video equipment manufacturing“ in the U.S....

North America Audio Distribution Systems Market is Growing at Compound...

US English Pese Audio Dataset

North America Audio Amplifiers market size will be USD 1478.20 million in...

American English General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Latin America's Audio Amplifiers market will be USD 184.78 million in 2024...

I-US English Singing Audio Dataset

Expenditures: Audio and Visual Equipment and Services by Race: White, Asian,...

Global Call Center & Conversational Audio Dataset — Multilingual, Validated,...

Latin America Audio Amplifier Market (2025-2031) | Size & Revenue

North America Audio Codec Market Size, Share & Industry Analysis Report By...

American English Call Center Data for Retail & E-Commerce AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Digital audio ad spend in Latin America 2020-2026

common-voice-english-audio

South America Audio Distribution Systems Market is Growing at Compound...

Audio Visual Speech Dataset: American English

American English Audio Visual Speech Dataset

Introduction

Dataset Content

Video Data

Metadata