100+ datasets found
  1. F

    Audio Visual Speech Dataset: American English

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

    Dataset Content

    This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

    Participant Diversity:
    Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.
    Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Video Data

    While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

    Recording Details:
    File Duration: Average duration of 30 seconds to 3 minutes per video.
    Formats: Videos are available in MP4 or MOV format.
    Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.
    Device: Both the latest Android and iOS devices are used in this collection.
    Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:
    Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.
    Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.
    Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.
    Face Orientation: Contains straight face and tilted face angles.
    Participant Positions: Records participants in both standing and seated positions.
    Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.
    Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.
    Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.
    Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:
    Happy
    Sad
    Excited
    Angry
    Annoyed
    Normal
    Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

    Metadata

    The dataset provides comprehensive metadata for each video recording and participant:

  2. d

    Customer Service Audio Dataset [Raw Call Recordings, Multi-Industry, U.S.]

    • datarade.ai
    .wav
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WiserBrand.com (2023). Customer Service Audio Dataset [Raw Call Recordings, Multi-Industry, U.S.] [Dataset]. https://datarade.ai/data-products/customer-service-audio-dataset-raw-call-recordings-multi-in-wiserbrand-com
    Explore at:
    .wavAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    WiserBrand.com
    Area covered
    United States of America
    Description

    This dataset contains thousands of authentic audio recordings of customer calls to service teams across key U.S. industries. Captured from inbound support channels, these files reflect natural speech in real service contexts, with varied speaker accents, background noise, and emotion levels. Each recording involves only a customer and a customer service agent, preserving a realistic two-party call structure.

    Dataset includes: - Thousands of customer service call recordings (WAV/MP3) - English language, native and accented speech - Real-world acoustic conditions (noise, silence, overlapping speech) - Dataset language: English (other languages on request)

    Use this dataset to: - Train speech-to-text engines on real-world, noisy support audio
    - Build speaker diarization and audio segmentation models
    - Simulate customer-agent voice interactions for LLM fine-tuning
    - Test multilingual or accent-robust audio pipelines
    - Develop acoustic models for call quality enhancement

    This audio-first dataset is ideal for ASR developers, call center AI builders, and speech researchers looking for real-life, labeled customer service calls.

  3. s

    US English Singing Audio Dataset

    • shaip.com
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2023). US English Singing Audio Dataset [Dataset]. https://www.shaip.com/offerings/speech-data-catalog/us-english-singing-audio-dataset/
    Explore at:
    Dataset updated
    Jun 12, 2023
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Home US English Singing Audio DatasetHigh-Quality US English Singing Audio Dataset for AI & Speech Models Contact Us OverviewTitleUS English Singing Audio DataseDataset TypeSinging AudioDescriptionSinging audio collection & transcriptionAudio categories:…

  4. F

    In-Car Speech Dataset: English (US)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). In-Car Speech Dataset: English (US) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-english-us
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.

    Speech Data

    This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.

    Participant Diversity:
    Speakers: 50+ native English speakers from the FutureBeeAI Community.
    Regions: Ensures a balanced representation of United States of America1 accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Nature: Scripted wake word and command type of audio recordings.
    Duration: Average duration of 5 to 20 seconds per audio recording.
    Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

    Dataset Diversity

    Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.

    Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.
    Different Cars: Data collection was carried out in different types and models of cars.
    Different Types of Voice Commands:
    Navigational Voice Commands
    Mobile Control Voice Commands
    Car Control Voice Commands
    Multimedia & Entertainment Commands
    General, Question Answer, Search Commands
    Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.
    Morning
    Afternoon
    Evening
    Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:
    Noise Level: Silent, Low Noise, Moderate Noise, High Noise
    Parking Location: Indoor, Outdoor
    Car Windows: Open, Closed
    Car AC: On, Off
    Car Engine: On, Off
    Car Movement: Stationary, Moving

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

  5. h

    Silver-Audio-Dataset

    • huggingface.co
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silver Avocado (2024). Silver-Audio-Dataset [Dataset]. https://huggingface.co/datasets/SilverAvocado/Silver-Audio-Dataset
    Explore at:
    Dataset updated
    Dec 15, 2024
    Dataset authored and provided by
    Silver Avocado
    Description

    Dataset Overview

    The dataset is a curated collection of .npy files containing MFCC features extracted from raw audio recordings. It has been specifically designed for training and evaluating machine learning models in the context of real-world emergency sound detection and classification tasks. The dataset captures diverse audio scenarios, making it a robust resource for developing safety-focused AI systems, such as the SilverAssistant project.

      Dataset Descriptions… See the full description on the dataset page: https://huggingface.co/datasets/SilverAvocado/Silver-Audio-Dataset.
    
  6. Industry revenue of “audio and video equipment manufacturing“ in the U.S....

    • statista.com
    Updated Jul 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Industry revenue of “audio and video equipment manufacturing“ in the U.S. 2012-2024 [Dataset]. https://www.statista.com/forecasts/310987/audio-and-video-equipment-manufacturing-revenue-in-the-us
    Explore at:
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2012 - 2017
    Area covered
    United States
    Description

    This statistic shows the revenue of the industry “audio and video equipment manufacturing“ in the U.S. from 2012 to 2017, with a forecast to 2024. It is projected that the revenue of audio and video equipment manufacturing in the U.S. will amount to approximately ******* million U.S. Dollars by 2024.

  7. c

    North America Audio Distribution Systems Market is Growing at Compound...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research, North America Audio Distribution Systems Market is Growing at Compound Annual Growth Rate of 4.4% from 2023 to 2030. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/north-america-audio-distribution-systems-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    North America, Region
    Description

    North America Audio Distribution Systems held the major market of more than 40% of the global revenue with a market size of USD XX million in 2023 and will grow at a compound annual growth rate (CAGR) of 4.4% from 2023 to 2030

  8. s

    US English Pese Audio Dataset

    • sm.shaip.com
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). US English Pese Audio Dataset [Dataset]. https://sm.shaip.com/offerings/speech-data-catalog/us-english-singing-audio-dataset/
    Explore at:
    Dataset updated
    Dec 6, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Aiga US Igilisi Pese Leo Fa'amaumauga Tulaga Maualuga US Igilisi Pese Leo mo AI & Tautala Fa'ata'ita'iga Fa'afeso'ota'i Matou Va'aiga Fa'aigoaItulaga US Pese Leo Fa'amatalaga Ituaiga Fa'amatalaga Pese leoFa'amatalagaAoina o leo pese ma fa'aliliu Va'aiga leo:…

  9. c

    North America Audio Amplifiers market size will be USD 1478.20 million in...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). North America Audio Amplifiers market size will be USD 1478.20 million in 2024. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/north-america-audio-amplifiers-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Region, North America
    Description

    North America Audio Amplifiers market size will be USD 1478.20 million in 2024 and will grow at a compound annual growth rate (CAGR) of 5.2% from 2024 to 2031. North America has emerged as a prominent participant, and its sales revenue is estimated to reach USD 2290.6 Million by 2031. This growth is mainly attributed to the region's growing advancement in automotive industry.

  10. F

    American English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for US English.
    Voice Assistants: Build smart assistants capable of understanding natural American conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  11. c

    Latin America's Audio Amplifiers market will be USD 184.78 million in 2024...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). Latin America's Audio Amplifiers market will be USD 184.78 million in 2024 and is estimated to grow at a compound annual growth rate (CAGR) of 6.4% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/south-america-audio-amplifiers-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Region, Latin America
    Description

    Latin America's Audio Amplifiers market will be USD 184.78 million in 2024 and is estimated to grow at a compound annual growth rate (CAGR) of 6.4% from 2024 to 2031. The market is foreseen to reach USD 314.5 million by 2031 due to the increasing demand from hpme appliances sector.

  12. s

    I-US English Singing Audio Dataset

    • zu.shaip.com
    Updated Jun 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2023). I-US English Singing Audio Dataset [Dataset]. https://zu.shaip.com/offerings/speech-data-catalog/us-english-singing-audio-dataset/
    Explore at:
    Dataset updated
    Jun 24, 2023
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Ikhaya Isethi Yedatha Yomsindo Yokucula yase-USIkhwalithi ephezulu yase-US Isethi yedatha yomsindo yokucula yase-US ye-AI namamodeli wenkulumo Xhumana nathi UhlolojikeleleIsihlokoIdatha ye-US English Yomsindo WokuculaIdathaIsethiUhlobo LomsindoIncazeloIqoqo lomsindo eliculayo nokulotshiweIzigaba zomsindo:...

  13. F

    Expenditures: Audio and Visual Equipment and Services by Race: White, Asian,...

    • fred.stlouisfed.org
    json
    Updated Sep 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Expenditures: Audio and Visual Equipment and Services by Race: White, Asian, and All Other Races, Not Including Black or African American [Dataset]. https://fred.stlouisfed.org/series/CXUTVAUDIOLB0902M
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 25, 2024
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Area covered
    United States
    Description

    Graph and download economic data for Expenditures: Audio and Visual Equipment and Services by Race: White, Asian, and All Other Races, Not Including Black or African American (CXUTVAUDIOLB0902M) from 1984 to 2023 about audio-visual, asian, equipment, white, expenditures, services, and USA.

  14. d

    Global Call Center & Conversational Audio Dataset — Multilingual, Validated,...

    • datarade.ai
    .mp3, .wav
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2025). Global Call Center & Conversational Audio Dataset — Multilingual, Validated, with Demographics + Custom Collection Available [Dataset]. https://datarade.ai/data-products/global-call-center-conversational-audio-dataset-multiling-filemarket
    Explore at:
    .mp3, .wavAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset authored and provided by
    FileMarket
    Area covered
    Croatia, Taiwan, Lesotho, New Caledonia, Comoros, Gabon, Burundi, Nigeria, Gibraltar, Namibia
    Description

    We provide a wide range of off-the-shelf multilingual audio datasets, featuring real-world call center dialogues and general conversational recordings from regions across Africa, Central America, South America, and Asia.

    Our datasets include multiple languages, local dialects, and authentic conversational flows — designed for AI training, contact center automation, and conversational AI development. All samples are human-validated and come with complete metadata.

    Each Dataset Includes:

    Unique Participant ID

    Gender (Male/Female)

    Country & City of Origin

    Speaker Age (18-60 years)

    Language (English + Multiple Local Languages)

    Audio Length: ~30 minutes per participant

    Validation Status: 100% Human-Checked

    Why Work With Us: ✅ Large library of ready-to-use multilingual datasets ✅ Authentic call center, customer service, and natural conversation recordings ✅ Global coverage with diverse speaker demographics ✅ Custom data collection service — we can source or record datasets tailored to your language, region, or domain needs

    Best For:

    Speech Recognition & Multilingual NLP

    Voicebots & Contact Center AI Solutions

    Dialect & Accent Recognition Training

    Conversational AI & Multilingual Assistants

    Customer Support & Quality Analytics

    Whether you need off-the-shelf datasets or unique, project-specific collections — we’ve got you covered.

    http://filemarket.ai

  15. 6

    Latin America Audio Amplifier Market (2025-2031) | Size & Revenue

    • 6wresearch.com
    excel, pdf,ppt,csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    6Wresearch, Latin America Audio Amplifier Market (2025-2031) | Size & Revenue [Dataset]. https://6wresearch.com/industry-report/latin-america-audio-amplifier-market-2021-2027
    Explore at:
    excel, pdf,ppt,csvAvailable download formats
    Dataset authored and provided by
    6Wresearch
    License

    https://www.6wresearch.com/privacy-policyhttps://www.6wresearch.com/privacy-policy

    Area covered
    United States
    Variables measured
    By Countries (Brazil, Mexico, Argentina, Rest of Latin America),, By Channel Type (Mono Channel, Two Channel, Four Channel, Six Channel, Others),, By End-User (Consumer Electronics, Automotive, Entertainment) And Competitive Landscape, By Device (Smartphones, Television Sets, Tablets, Desktops & Laptops, Home Audio Systems, Automotive Infotainment Systems, Professional Audio Systems),
    Description

    Latin America Audio Amplifier Market is expected to grow during 2025-2031

  16. k

    North America Audio Codec Market Size, Share & Industry Analysis Report By...

    • kbvresearch.com
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KBV Research (2025). North America Audio Codec Market Size, Share & Industry Analysis Report By Function, By Component Type, By Application (Smartphones & Tablets, Headphone, Head Sets & Wearable devices, Automobile, Television Sets, Desktop & Laptops, Music & Media Devices & Home Theatres, Gaming Consoles, and Other Application), By Country and Growth Forecast, 2025 - 2032 [Dataset]. https://www.kbvresearch.com/north-america-audio-codec-market/
    Explore at:
    Dataset updated
    Aug 11, 2025
    Dataset authored and provided by
    KBV Research
    License

    https://www.kbvresearch.com/privacy-policy/https://www.kbvresearch.com/privacy-policy/

    Time period covered
    2025 - 2032
    Area covered
    North America
    Description

    The North America Audio Codec Market would witness market growth of 5.9% CAGR during the forecast period (2025-2032). The US market dominated the North America Audio Codec Market by Country in 2024, and would continue to be a dominant market till 2032; thereby, achieving a market value of USD 2,361

  17. F

    American English Call Center Data for Retail & E-Commerce AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US English Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.

    Participant Diversity:
    Speakers: 60 native US English speakers from our verified contributor pool.
    Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.

    Inbound Calls:
    Product Inquiries
    Order Cancellations
    Refund & Exchange Requests
    Subscription Queries, and more
    Outbound Calls:
    Order Confirmations
    Upselling & Promotions
    Account Updates
    Loyalty Program Offers
    Customer Verifications, and others

    Such variety enhances your model’s ability to generalize across retail-specific voice interactions.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    30 hours-coded Segments
    Non-speech Tags (e.g., pauses, cough)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.

    Usage and Applications

    This dataset is ideal for a range of voice AI and NLP applications:

    Automatic Speech Recognition (ASR): Fine-tune English speech-to-text systems.
    <span

  18. Digital audio ad spend in Latin America 2020-2026

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Digital audio ad spend in Latin America 2020-2026 [Dataset]. https://www.statista.com/statistics/1272904/latin-america-audio-ad-spend/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    LAC, Latin America
    Description

    In 2020, digital audio (streaming radio and podcast) advertising spending in Latin America’s largest markets stood at **** million U.S. dollars. This figure is expected to nearly quadruple by 2026, reaching an estimated ** million dollars that year.

    Digital advertising in Latin America

    In line with rising internet adoption rates, digital advertising spending in Latin America has rapidly increased over the past few years. In 2020, digital ad spend in the region amounted to approximately **** billion U.S. dollars, marking a boost of almost ** percent compared to the previous year. This spike was arguably fueled by the outbreak of the coronavirus (COVID-19) pandemic, which spurred online usage like never before. Taking a closer look at the different countries, Brazil expectedly stands out as the leading digital ad market in Latin America, with nearly *** billion U.S. dollars in digital ad investments in 2020.

    The digital audio landscape is constantly expanding

    The digital audio market is also slowly gaining momentum in Latin America. For example, the average daily time spend listening to online radio in Brazil increased from *** minutes in 2017 to *** minutes in 2020, signaling a mounting interest in digital audio options. In addition to radio, audiences also embrace music streaming content more vividly than ever, as the rising number of Spotify users in Latin America continues to demonstrate. Meanwhile, the region is also rapidly growing its podcast listener base every year. Knowing that Spanish is poised to become the second universal language for podcasting worldwide and podcasts such as “La Corneta” boast more than *** million downloads per week, audio advertisers in Latin America have their work cut out for them.

  19. h

    common-voice-english-audio

    • huggingface.co
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Freddy Boulton (2025). common-voice-english-audio [Dataset]. https://huggingface.co/datasets/freddyaboulton/common-voice-english-audio
    Explore at:
    Dataset updated
    Apr 11, 2025
    Authors
    Freddy Boulton
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    freddyaboulton/common-voice-english-audio dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. c

    South America Audio Distribution Systems Market is Growing at Compound...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). South America Audio Distribution Systems Market is Growing at Compound Annual Growth Rate of 5.6% from 2023 to 2030. [Dataset]. https://www.cognitivemarketresearch.com/regional-analysis/south-america-audio-distribution-systems-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    South America, Region
    Description

    South America Audio Distribution Systems market of more than 5% of the global revenue with a market size of USD XX million in 2023 and will grow at a compound annual growth rate (CAGR) of 5.6% from 2023 to 2030.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset

Audio Visual Speech Dataset: American English

American English Audio Visual Speech Dataset

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Area covered
United States
Dataset funded by
FutureBeeAI
Description

Introduction

Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

Dataset Content

This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

Participant Diversity:
Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.
Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

Video Data

While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

Recording Details:
File Duration: Average duration of 30 seconds to 3 minutes per video.
Formats: Videos are available in MP4 or MOV format.
Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.
Device: Both the latest Android and iOS devices are used in this collection.
Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:
Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.
Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.
Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.
Face Orientation: Contains straight face and tilted face angles.
Participant Positions: Records participants in both standing and seated positions.
Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.
Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.
Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.
Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:
Happy
Sad
Excited
Angry
Annoyed
Normal
Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

Metadata

The dataset provides comprehensive metadata for each video recording and participant:

Search
Clear search
Close search
Google apps
Main menu