4 datasets found

n
388 Hours - Spanish Speaking English Speech Data by Mobile Phone
nexdata.ai
Updated Oct 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 388 Hours - Spanish Speaking English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/990
Explore at:
Dataset updated
Oct 31, 2023
Dataset provided by
Nexdata
nexdata technology inc
Authors
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
English(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
338 Hours-Spanish Speech Data by Mobile Phone
m.nexdata.ai
nexdata.ai
Updated Nov 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 338 Hours-Spanish Speech Data by Mobile Phone [Dataset]. https://m.nexdata.ai/datasets/speechrecog/245?source=Github
Explore at:
Dataset updated
Nov 20, 2024
Dataset authored and provided by
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
Spanish Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering News, comments, encyclopedia, economy, science and law domains, with balanced gender distribution. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(800 people from Spain, Mexico, Argentina, etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
F
Spanish (Spain) Call Center Data for Travel AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish (Spain) Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-spanish-spain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Spain
Dataset funded by
FutureBeeAI
Description
Introduction
This Spanish Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Spanish -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
Speech Data
The dataset includes 30 hours of dual-channel audio recordings between native Spanish speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
•Participant Diversity:
•
Speakers: 60 native Spanish contributors from our verified pool.

•
Regions: Covering multiple Spain provinces to capture accent and dialectal variation.

•
Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).

•Recording Details:
•
Conversation Nature: Naturally flowing, spontaneous customer-agent calls.

•
Call Duration: Between 5 and 15 minutes per session.

•
Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.

•
Recording Environment: Captured in controlled, noise-free, echo-free settings.

Topic Diversity
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
•Inbound Calls:
•Booking Assistance
•Destination Information
•Flight Delays or Cancellations
•Support for Disabled Passengers
•Health and Safety Travel Inquiries
•Lost or Delayed Luggage, and more
•Outbound Calls:
•Promotional Travel Offers
•Customer Feedback Surveys
•Booking Confirmations
•Flight Rescheduling Alerts
•Visa Expiry Notifications, and others
These scenarios help models understand and respond to diverse traveler needs in real-time.
Transcription
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-Stamped Segments
•Non-speech Markers (e.g., pauses, coughs)
•High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.
Metadata
Extensive metadata enriches each call and speaker for better filtering and AI training:
•
Participant Metadata: ID, age, gender, region, accent, and dialect.

•
Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

Usage and Applications
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
•
ASR Systems: Train Spanish speech-to-text engines for travel platforms.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex;
t
HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN
portal.tad3.org
Updated Nov 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN [Dataset]. https://portal.tad3.org/dataset/hispanic-or-latino-and-race-dp05_pin_t
Explore at:
Dataset updated
Nov 17, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
ACS DEMOGRAPHIC AND HOUSING ESTIMATES HISPANIC OR LATINO AND RACE - DP05 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 The terms “Hispanic,” “Latino,” and “Spanish” are used interchangeably. Some respondents identify with all three terms while others may identify with only one of these three specific terms. People who identify with the terms “Hispanic,” “Latino,” or “Spanish” are those who classify themselves in one of the specific Hispanic, Latino, or Spanish categories listed on the questionnaire (“Mexican, Mexican Am., or Chicano,” “Puerto Rican,” or “Cuban”) as well as those who indicate that they are “another Hispanic, Latino, or Spanish origin.” People who do not identify with one of the specific origins listed on the questionnaire but indicate that they are “another Hispanic, Latino, or Spanish origin” are those whose origins are from Spain, the Spanish-speaking countries of Central or South America, or another Spanish culture or origin. Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before their arrival in the UnitedStates. People who identify their origin as Hispanic, Latino, or Spanish may be of any race.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nexdata (2023). 388 Hours - Spanish Speaking English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/990

388 Hours - Spanish Speaking English Speech Data by Mobile Phone

Explore at:

Dataset updated

Oct 31, 2023

Dataset provided by

Nexdata
nexdata technology inc

Authors

Nexdata

Variables measured

Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation

Description

English(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Clear search

Close search

Google apps

Main menu

388 Hours - Spanish Speaking English Speech Data by Mobile Phone

338 Hours-Spanish Speech Data by Mobile Phone

Spanish (Spain) Call Center Data for Travel AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN

388 Hours - Spanish Speaking English Speech Data by Mobile PhoneSee More Versions

388 Hours - Spanish Speaking English Speech Data by Mobile Phone