97 datasets found
  1. Number of native Spanish speakers worldwide 2024, by country

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

  2. Spanish speakers in countries where Spanish is not an official language 2024...

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Spanish speakers in countries where Spanish is not an official language 2024 [Dataset]. https://www.statista.com/statistics/1276290/number-spanish-speakers-non-hispanic-countries-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.

  3. Hispanic population U.S. 2023, by state

    • statista.com
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Hispanic population U.S. 2023, by state [Dataset]. https://www.statista.com/statistics/259850/hispanic-population-of-the-us-by-state/
    Explore at:
    Dataset updated
    Oct 18, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.

  4. Largest countries in Latin America, by land area

    • statista.com
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Largest countries in Latin America, by land area [Dataset]. https://www.statista.com/statistics/990519/largest-countries-area-latin-america/
    Explore at:
    Dataset updated
    Aug 16, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    LAC, Latin America
    Description

    Based on land area, Brazil is the largest country in Latin America by far, with a total area of over 8.5 million square kilometers. Argentina follows with almost 2.8 million square kilometers. Cuba, whose surface area extends over almost 111,000 square kilometers, is the Caribbean country with the largest territory.

    Brazil: a country with a lot to offer

    Brazil's borders reach nearly half of the South American subcontinent, making it the fifth-largest country in the world and the third-largest country in the Western Hemisphere. Along with its landmass, Brazil also boasts the largest population and economy in the region. Although Brasília is the capital, the most significant portion of the country's population is concentrated along its coastline in the cities of São Paulo and Rio de Janeiro.

    South America: a region of extreme geographic variation

    With the Andes mountain range in the West, the Amazon Rainforest in the East, the Equator in the North, and Cape Horn as the Southern-most continental tip, South America has some of the most diverse climatic and ecological terrains in the world. At its core, its biodiversity can largely be attributed to the Amazon, the world's largest tropical rainforest, and the Amazon river, the world's largest river. However, with this incredible wealth of ecology also comes great responsibility. In the past decade, roughly 80,000 square kilometers of the Brazilian Amazon were destroyed. And, as of late 2019, there were at least 1,000 threatened species in Brazil alone.

  5. The most spoken languages worldwide 2023

    • statista.com
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2023 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Jan 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    World
    Description

    In 2023, there were around 1.5 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.1 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year.

    Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation and other official pronouncements. The United States is a land of immigrations and the languages spoken in the United States vary as a result of the multi-cultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over 41 million people spoke at home in 2021. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.7 million Tagalog speakers and 1.5 million Vietnamese speakers counted in the United States that year.

    Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 44 percent of California’s population was speaking a language other than English at home in 2021.

  6. Spanish-language e-book revenue worldwide 2020-2023, by country

    • flwrdeptvarieties.store
    • statista.com
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Watson (2023). Spanish-language e-book revenue worldwide 2020-2023, by country [Dataset]. https://flwrdeptvarieties.store/?_=%2Ftopics%2F1474%2Fe-books%2F%23zUpilBfjadnZ6q5i9BcSHcxNYoVKuimb
    Explore at:
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Amy Watson
    Description

    In 2023, Spanish-language e-books sold in Spain made up 55.7 percent of the global Spanish-language e-book sales revenue. Mexico was the second largest market with over 20 percent of the global sales. The United States ranked third.

  7. Number of students learning Spanish worldwide 2024, by country

    • statista.com
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of students learning Spanish worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/1276319/number-spanish-language-students-country-worldwide/
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide, Spain
    Description

    The United States is the country with the largest number of Spanish language students, at approximately 8.59 million people in 2024. The second country is Brazil, with around 4.05 million students of the Spanish language. Moreover, the United States is also the non-hispanic country with the largest number of native Spanish speakers in the world.

  8. F

    Travel Call Center Speech Data: Spanish (Mexico)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Travel Call Center Speech Data: Spanish (Mexico) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish Call Center Speech Dataset for the Travel domain designed to enhance the development of call center speech recognition models specifically for the Travel industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data:

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Travel domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native Mexican Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of Mexico, ensuring a balanced representation of Mexican accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Booking inquiries and assistance
    Destination information and recommendations
    Assistance with flight delays or cancellations
    Special assistance for passengers with disabilities
    Travel-related health and safety inquiry
    Assistance with lost or delayed baggage, and many more
    Outbound Calls:
    Promotional offers and package deals
    Customer satisfaction surveys
    Booking confirmations and updates
    Flight schedule changes and notifications
    Customer feedback collection
    Reminders for passport or visa expiration date, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Travel domain call center conversational AI and ASR models for the Mexican Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

  9. Hispanic population in the U.S. 2023, by origin

    • statista.com
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Hispanic population in the U.S. 2023, by origin [Dataset]. https://www.statista.com/statistics/234852/us-hispanic-population/
    Explore at:
    Dataset updated
    Oct 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    As of 2023, around 37.99 million people of Mexican descent were living in the United States - the largest of any Hispanic group. Puerto Ricans, Salvadorans, Cubans, and Dominicans rounded out the top five Hispanic groups living in the U.S. in that year.

  10. F

    Real Estate Call Center Speech Data: Spanish (Mexico)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Real Estate Call Center Speech Data: Spanish (Mexico) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish Call Center Speech Dataset for the Real Estate domain designed to enhance the development of call center speech recognition models specifically for the Real Estate industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data:

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Real Estate domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native Mexican Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of Mexico, ensuring a balanced representation of Mexican accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Property Inquiry
    Rental Property Search & Availability
    Renovation Inquiries
    Property Features & Amenities Inquiry
    Investment Property Analysis & Advice
    Property History & Ownership Details, and many more
    Outbound Calls:
    New Property Listing Update
    Post Purchase Follow-ups
    Investment Opportunities & Property Recommendations
    Property Value Updates
    Customer Satisfaction Surveys, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Real Estate domain call center conversational AI and ASR models for the Mexican Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Mexican Spanish call center speech recognition models.

    <h3

  11. F

    General Domain Scripted Monologue Speech Data: Spanish (Spain)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    General Domain Scripted Monologue Speech Data: Spanish (Spain) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/general-scripted-speech-monologues-spanish-spain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Spain
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Spanish Scripted Monologue Speech Dataset for the General Domain. This meticulously curated dataset is designed to advance the development of General domain Spanish language speech recognition models.

    Speech Data

    This training dataset comprises over 6,000 high-quality scripted prompt recordings in Spanish. These recordings cover various General domain topics and scenarios, designed to build robust and accurate speech technology.

    Participant Diversity:
    Speakers: 60 native Spanish speakers from different regions of Spain.
    Regions: Ensures a balanced representation of Spanish accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Recording Nature: Audio recordings of scripted prompts/monologues.
    Audio Duration: Average duration of 5 to 30 seconds per recording.
    Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.
    Environment: Recordings are conducted in quiet settings without background noise and echo.
    Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios from the General domain. Topics include:
    Daily Conversations
    Topic Specific Conversation
    General Information and Advice
    Idoms and Sayings
    Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in general interactions:
    Names: Region-specific names of males and females in various formats.
    Addresses: Region-specific addresses in different spoken formats.
    Dates & Times: Inclusion of date and time in various contexts.
    Organization Names: Names of different types of organizations.
    Numbers & Currencies: Various numbers and currencies in domain-specific interactions.

    Each scripted prompt is crafted to reflect real-life scenarios encountered in the General domain, ensuring applicability in training robust natural language processing and speech recognition models.

    Transcription Data

    In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.

    Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.
    Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.
    Quality: All transcriptions are verified for accuracy and consistency by native Spanish transcribers.

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, and dialect.
    Other Metadata:

  12. F

    Healthcare Call Center Speech Data: Spanish (Colombia)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Healthcare Call Center Speech Data: Spanish (Colombia) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-spanish-colombia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Colombia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Colombian Spanish Call Center Speech Dataset for the Healthcare domain designed to enhance the development of call center speech recognition models specifically for the Healthcare industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Healthcare domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native Colombian Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of Colombia, ensuring a balanced representation of Colombian accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgery Consultation
    Consultation regarding Diet, and many more
    Outbound Calls:
    Appointment Reminder
    Health and Wellness Subscription Programs
    Lab Tests Results
    Health Risk Assessments
    Preventive Care Reminders, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Healthcare domain call center conversational AI and ASR models for the Colombian Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Colombian Spanish call center speech recognition models.

    Usage and Applications

    This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Healthcare domain. Potential use cases include:

    <span

  13. F

    Travel Scripted Monologue Speech Data: Spanish (Spain)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Travel Scripted Monologue Speech Data: Spanish (Spain) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/travel-scripted-speech-monologues-spanish-spain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Spain
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Spanish Scripted Monologue Speech Dataset for the Travel Domain. This meticulously curated dataset is designed to advance the development of Spanish language speech recognition models, particularly for the Travel industry.

    Speech Data

    This training dataset comprises over 6,000 high-quality scripted prompt recordings in Spanish. These recordings cover various topics and scenarios relevant to the Travel domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 native Spanish speakers from different regions of Spain.
    Regions: Ensures a balanced representation of Spanish accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Recording Nature: Audio recordings of scripted prompts/monologues.
    Audio Duration: Average duration of 5 to 30 seconds per recording.
    Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.
    Environment: Recordings are conducted in quiet settings without background noise and echo.
    Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios to ensure comprehensive coverage of the Travel sector. Topics include:
    Customer Service Interactions
    Booking and Reservations
    Travel Inquiries
    Technical Support
    General Information and Advice
    Promotional and Sales Events
    Domain Specific Statements
    Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in Travel interactions:
    Names: Region-specific names of males and females in various formats.
    Addresses: Region-specific addresses in different spoken formats.
    Dates & Times: Inclusion of date and time in various travel contexts, such as booking dates, departure and arrival times.
    Destinations: Specific names of cities, countries, and tourist attractions relevant to the travel sector.
    Numbers & Prices: Various numbers and prices related to ticket costs, hotel rates, and transaction amounts.
    Booking IDs and Confirmation Numbers: Inclusion of booking identification and confirmation details for realistic customer service scenarios.

    Each scripted prompt is crafted to reflect real-life scenarios encountered in the Travel domain, ensuring applicability in training robust natural language processing and speech recognition models.

    Transcription Data

    In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.

    Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.
    Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.
    <div style="margin-top:10px;

  14. Languages in Mexico 2020

    • statista.com
    Updated Jul 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2019). Languages in Mexico 2020 [Dataset]. https://www.statista.com/statistics/275440/languages-in-mexico/
    Explore at:
    Dataset updated
    Jul 30, 2019
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Mexico
    Description

    In 2020, about 93.8 percent of the Mexican population was monolingual in Spanish. Around five percent spoke a combination of Spanish and indigenous languages. Spanish is the third-most spoken native language worldwide, after Mandarin Chinese and Hindi.

    Mexican Spanish

    Spanish was first being used in Mexico in the 16th century, at the time of Spanish colonization during the Conquest campaigns of what is now Mexico and the Caribbean. As of 2018, Mexico is the country with the largest number of native Spanish speakers worldwide. Mexican Spanish is influenced by English and Nahuatl, and has about 120 million users. The Mexican government uses Spanish in the majority of its proceedings, however it recognizes 68 national languages, 63 of which are indigenous.

    Indigenous languages spoken

    Of the indigenous languages spoken, two of the most widely used are Nahuatl and Maya. Due to a history of marginalization of indigenous groups, most indigenous languages are endangered, and many linguists warn they might cease to be used after a span of just a few decades. In recent years, legislative attempts such as the San Andréas Accords have been made to protect indigenous groups, who make up about 25 million of Mexico’s 125 million total inhabitants, though the efficacy of such measures is yet to be seen.

  15. F

    Telecom Call Center Speech Data: Spanish (USA)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Telecom Call Center Speech Data: Spanish (USA) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US Spanish Call Center Speech Dataset for the Telecom domain designed to enhance the development of call center speech recognition models specifically for the Telecom industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Telecom domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native US Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of USA, ensuring a balanced representation of US accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refunds and Billing Adjustments
    Emergency Service Access, and many more
    Outbound Calls:
    Welcome Calls / Onboarding Process
    Payment Reminders
    Customer Surveys
    Technical Updates
    Service Usage Reviews
    Network Compliant Status Call, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Telecom domain call center conversational AI and ASR models for the US Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    <b

  16. F

    General Domain Scripted Monologue Speech Data: Spanish (Argentina)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). General Domain Scripted Monologue Speech Data: Spanish (Argentina) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/general-scripted-speech-monologues-spanish-argentina
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Argentina
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Argentina Spanish Scripted Monologue Speech Dataset for the General Domain. This meticulously curated dataset is designed to advance the development of General domain Spanish language speech recognition models.

    Speech Data

    This training dataset comprises over 6,000 high-quality scripted prompt recordings in Argentina Spanish. These recordings cover various General domain topics and scenarios, designed to build robust and accurate speech technology.

    Participant Diversity:
    Speakers: 60 native Spanish speakers from different regions of Argentina.
    Regions: Ensures a balanced representation of Argentina Spanish accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Recording Nature: Audio recordings of scripted prompts/monologues.
    Audio Duration: Average duration of 5 to 30 seconds per recording.
    Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.
    Environment: Recordings are conducted in quiet settings without background noise and echo.
    Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios from the General domain. Topics include:
    Daily Conversations
    Topic Specific Conversation
    General Information and Advice
    Idoms and Sayings
    Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in general interactions:
    Names: Region-specific names of males and females in various formats.
    Addresses: Region-specific addresses in different spoken formats.
    Dates & Times: Inclusion of date and time in various contexts.
    Organization Names: Names of different types of organizations.
    Numbers & Currencies: Various numbers and currencies in domain-specific interactions.

    Each scripted prompt is crafted to reflect real-life scenarios encountered in the General domain, ensuring applicability in training robust natural language processing and speech recognition models.

    Transcription Data

    In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.

    Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.
    Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.
    Quality: All transcriptions are verified for accuracy and consistency by native Spanish transcribers.

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, and dialect.
    <b

  17. F

    Retail & E-commerce Call Center Speech Data: Spanish (Colombia)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Retail & E-commerce Call Center Speech Data: Spanish (Colombia) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-spanish-colombia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Colombia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Colombian Spanish Call Center Speech Dataset for the Retail domain designed to enhance the development of call center speech recognition models specifically for the Retail industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data

    This training dataset comprises 30 hours of call center audio recordings covering various topics and scenarios related to the Retail domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native Colombian Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of Colombia, ensuring a balanced representation of Colombian accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Product Inquiry
    Return/Exchange Request
    Order Cancellation
    Refund Request
    Membership/Subscriptions Enquiry
    Order Cancellations, and many more
    Outbound Calls:
    Order Confirmation
    Cross-selling and Upselling
    Account Updates
    Loyalty Program offers
    Special Offers and Promotions
    Customer Verification, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Retail domain call center conversational AI and ASR models for the Colombian Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of

  18. F

    Delivery & Logistics Call Center Speech Data: Spanish (Colombia)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Delivery & Logistics Call Center Speech Data: Spanish (Colombia) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-spanish-colombia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Colombia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Colombian Spanish Call Center Speech Dataset for the Delivery and Logistics domain designed to enhance the development of call center speech recognition models specifically for the Delivery and Logistics industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and xscenarios related to the Delivery and Logistics domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native Colombian Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of Colombia, ensuring a balanced representation of Colombian accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Order Tracking
    Delivery Complaint
    Undeliverable Address
    Delivery Method Selection
    Return Process Enquiry
    Order Modification, and many more
    Outbound Calls:
    Delivery Confirmation
    Delivery Subscription
    Incorrect Address
    Missed Delivery Attempt
    Delivery Feedback
    Out-of-Stock Notification
    Delivery Satisfaction Survey, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Delivery and Logistics domain call center conversational AI and ASR models for the Colombian Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    Conversation Metadata: Domain, topic, call type,

  19. Latin America: level of English proficiency 2023, by country

    • statista.com
    • flwrdeptvarieties.store
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Latin America: level of English proficiency 2023, by country [Dataset]. https://www.statista.com/statistics/1053066/english-proficiency-latin-america/
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    LAC, Latin America
    Description

    Argentina scored 562 out of a maximum of 800 points in the English Proficiency Index 2023. That was the highest score among all Latin American countries included in the survey. The Argentine capital, Buenos Aires, also received the highest English proficiency score among all the Latin American cities analyzed. Mexico and Haiti received the lowest scores in the region.

  20. f

    Using event-related potentials to track morphosyntactic development in...

    • figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Alemán Bañón; Robert Fiorentino; Alison Gabriele (2023). Using event-related potentials to track morphosyntactic development in second language learners: The processing of number and gender agreement in Spanish [Dataset]. http://doi.org/10.1371/journal.pone.0200791
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    José Alemán Bañón; Robert Fiorentino; Alison Gabriele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used event-related potentials to investigate morphosyntactic development in 78 adult English-speaking learners of Spanish as a second language (L2) across the proficiency spectrum. We examined how development is modulated by the similarity between the native language (L1) and the L2, by comparing number (a feature present in English) and gender agreement (novel feature). We also investigated how development is impacted by structural distance, manipulating the distance between the agreeing elements by probing both within-phrase (fruta muy jugosa “fruit-FEM-SG very juicy-FEM-SG”) and across-phrase agreement (fresa es ácida “strawberry-FEM-SG is tart-FEM-SG”). Regression analyses revealed that the learners’ overall proficiency, as measured by a standardized test, predicted their accuracy with the target properties in the grammaticality judgment task (GJT), but did not predict P600 magnitude to the violations. However, a relationship emerged between immersion in Spanish-speaking countries and P600 magnitude for gender. Our results also revealed a correlation between accuracy in the GJT and P600 magnitude, suggesting that behavioral sensitivity to the target property predicts neurophysiological sensitivity. Subsequent group analyses revealed that the highest-proficiency learners showed equally robust P600 effects for number and gender. This group also elicited more positive waveforms for within- than across-phrase agreement overall, similar to the native controls. The lowest-proficiency learners showed a P600 for number overall, but no effects for gender. Unlike the highest-proficiency learners, they also showed no sensitivity to structural distance, suggesting that sensitivity to such linguistic factors develops over time. Overall, these results suggest an important role for proficiency in morphosyntactic development, although differences emerged between behavioral and electrophysiological measures. While L2 proficiency predicted behavioral sensitivity to agreement, development with respect to the neurocognitive mechanisms recruited in processing only emerged when comparing the two extremes of the proficiency spectrum. Importantly, while both L1-L2 similarity and hierarchical structure impact development, they do not constrain it.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
Organization logo

Number of native Spanish speakers worldwide 2024, by country

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description

Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

Search
Clear search
Close search
Google apps
Main menu