4 datasets found
  1. n

    388 Hours - Spanish Speaking English Speech Data by Mobile Phone

    • nexdata.ai
    Updated Oct 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 388 Hours - Spanish Speaking English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/990
    Explore at:
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Nexdata
    nexdata technology inc
    Authors
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
    Description

    English(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  2. 338 Hours-Spanish Speech Data by Mobile Phone

    • m.nexdata.ai
    • nexdata.ai
    Updated Nov 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 338 Hours-Spanish Speech Data by Mobile Phone [Dataset]. https://m.nexdata.ai/datasets/speechrecog/245?source=Github
    Explore at:
    Dataset updated
    Nov 20, 2024
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
    Description

    Spanish Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering News, comments, encyclopedia, economy, science and law domains, with balanced gender distribution. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(800 people from Spain, Mexico, Argentina, etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  3. F

    Spanish (Spain) Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Spanish (Spain) Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-spanish-spain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Spain
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Spanish Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Spanish -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native Spanish speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native Spanish contributors from our verified pool.
    Regions: Covering multiple Spain provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train Spanish speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex;

  4. t

    HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN

    • portal.tad3.org
    Updated Nov 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN [Dataset]. https://portal.tad3.org/dataset/hispanic-or-latino-and-race-dp05_pin_t
    Explore at:
    Dataset updated
    Nov 17, 2024
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    ACS DEMOGRAPHIC AND HOUSING ESTIMATES HISPANIC OR LATINO AND RACE - DP05 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 The terms “Hispanic,” “Latino,” and “Spanish” are used interchangeably. Some respondents identify with all three terms while others may identify with only one of these three specific terms. People who identify with the terms “Hispanic,” “Latino,” or “Spanish” are those who classify themselves in one of the specific Hispanic, Latino, or Spanish categories listed on the questionnaire (“Mexican, Mexican Am., or Chicano,” “Puerto Rican,” or “Cuban”) as well as those who indicate that they are “another Hispanic, Latino, or Spanish origin.” People who do not identify with one of the specific origins listed on the questionnaire but indicate that they are “another Hispanic, Latino, or Spanish origin” are those whose origins are from Spain, the Spanish-speaking countries of Central or South America, or another Spanish culture or origin. Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before their arrival in the UnitedStates. People who identify their origin as Hispanic, Latino, or Spanish may be of any race.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nexdata (2023). 388 Hours - Spanish Speaking English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/990

388 Hours - Spanish Speaking English Speech Data by Mobile Phone

Explore at:
Dataset updated
Oct 31, 2023
Dataset provided by
Nexdata
nexdata technology inc
Authors
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description

English(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Search
Clear search
Close search
Google apps
Main menu