Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.
In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.
Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.
Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..In 2016, changes were made to the languages and language categories presented in tables B16001, C16001, and B16002. For more information, see: 2016 Language Data User note..Geographical restrictions have been applied to Table B16001 - LANGUAGE SPOKEN AT HOME BY ABILITY TO SPEAK ENGLISH FOR THE POPULATION 5 YEARS AND OVER for the 5-year data estimates. These restrictions are in place to protect data privacy for the speakers of smaller languages. Geographic areas published for the 5-year B16001 table include: Nation (010), States (040), Metropolitan Statistical Area-Metropolitan Divisions (314), Combined Statistical Areas (330), Congressional Districts (500), and Public Use Microdata Sample Areas (PUMAs) (795). For more information on these geographical delineations, see the Metropolitan Statistical Area Reference Files. County and tract-level data are no longer available for table B16001; for specific language data for these smaller geographies, please use table C16001. Additional languages are also available in the Public Use Microdata Sample (PUMS), at the State and Public Use Microdata Sample Area (PUMA) levels of geography..The 2016-2020 American Community Survey (ACS) data generally reflect the September 2018 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances, the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineation lists due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
AbstractIntroduction Hispanic-English Database contains approximately 30 hours of English and Spanish conversational and read speech with transcripts (24 hours) and metadata collected from 22 non-native English speakers between 1996 and 1998. The corpus was developed by Entropic Research Laboratory, Inc., a developer of speech recognition and speech synthesis software toolkits that was acquired by Microsoft in 1999. Participants were adult native speakers of Spanish as spoken in Central America and South America who resided in the Palo Alto, California area, had lived in the United States for at least one year and demonstrated a basic ability to understand, read and speak English. They read a total of 2200 sentences, 50 each in Spanish and English per speaker. The Spanish sentence prompts were a subset of the materials in LATINO-40 Spanish Read News, and the English sentence prompts were taken from the TIMIT database. Conversations were task-oriented, drawing on exercises similar to those used in English second language instruction and designed to engage the speakers in collaborative, problem-solving activities. Data Read speech was recorded on two wideband channels with a Shure SM10A head-mounted microphone in a quiet laboratory environment. The conversational speech was simultaneously recorded on four channels, two of which were used to place phone calls to each subject in two separate offices and to record the incoming speech of the two channels into separate files. The audio was originally saved under the Entropic Audio (ESPS) format using a 16kHz sampling rate and 16 bit samples. Audio files were converted to flac compressed .wav files from the ESPS format. ESPS headers were removed and are presented in this release as *.hdr files that include demographic and technical data. Transcripts were developed with the Entropic Annotator tool and are time-aligned with speaker turns. The transcription conventions were based on those used in the LDC Switchboard and CALLHOME collections. Transcript files are denoted with a .lab extension. Data files and their corresponding label files are stored in subdirectories named using a speaker-pair id and session number. The first three letters identify the speaker on channel A. The last three letters identify the speaker on channel B. Wideband audio files contain *.wb.flac in their file name, and narrow band audio files are denoted with a *.nb.flac in the file name.
This map shows the predominant non-English language spoken at home by the population age 5+. The pattern is seen at state, county, and tract geographies. The data shown is current-year American Community Survey (ACS) data from the US Census. The data is updated each year when the ACS releases its new 5-year estimates. For more information about this data, visit this page.To learn more about when the ACS releases data updates, click here.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Dataset Card for People's Speech
Dataset Summary
The People's Speech Dataset is among the world's largest English speech recognition corpus today that is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0. It includes 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially is available with a permissive license.
Supported Tasks… See the full description on the dataset page: https://huggingface.co/datasets/MLCommons/peoples_speech.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
*Asian/Pacific Islander
Language questions were only asked of persons 5 years and older. The language question is about current use of a non-English language at home, not about ability to speak another language or the use of such a language in the past. People who speak a language other than English outside of the home are not reported as speaking a language other than English. Similarly, people whose mother tongue is a non-English language but who do not currently use the language at home do not report the language.
Source: U.S. Census Bureau; 2013-2017 American Community Survey 5-Year Estimates, Table DP02.
This map shows the predominant language(s) spoken by people who have limited English speaking ability. This is shown using American Community Survey data from the US Census Bureau by state, county, and tract.There are 12 different language/language groupings: SpanishFrench, Haitian, or CajunKoreanChinese (including Mandarin and Cantonese)VietnameseTagalog (including Filipino)ArabicGerman or other West GermanicRussian, Polish, or other SlavicOther Indo-European (such as Italian or Portuguese)Other Asian and Pacific Island (such as Japanese or Hmong)Other and unspecified (such as Navajo or Hebrew).This map also uses a feature effect to identify the counties with either 10,000 or 5% of the population having limited English ability. According to the Voting Rights Act, "localities where there are more than 10,000 or over 5 percent of the total voting age citizens in a single political subdivision (usually a county, but a township or municipality in some states) who are members of a single language minority group, have depressed literacy rates, and do not speak English very well" are required to "provide [voting materials] in the language of the applicable minority group as well as in the English language".This map uses these hosted feature layers containing the most recent American Community Survey data. These layers are part of ArcGIS Living Atlas, and are updated every year when the American Community Survey releases new estimates, so values in the map always reflect the newest data available.
This map shows the predominant language spoken at home by the US population aged 5+. This is shown by Census Tract and County centroids. The data values are from the 2012-2016 American Community Survey 5-year estimates in the S1601 Table for Language Spoken at Home. The popup in the map provides a breakdown of the population age 5+ by the language spoken at home. Data values for other age groups are also available within the data's table. The color of the symbols represent the most common language spoken at home. This predominance map style compares the count of people age 5+ based on what language is spoken at home, and returns the value with the highest count. The census breaks down the population 5+ by the following language options:English OnlyNon-English - SpanishNon-English - Asian and Pacific Islander LanguagesNon-English - Indo European LanguagesNon-English - OtherThe size of the symbols represents how many people are 5 years or older, which helps highlight the quantity of people that live within an area that were sampled for this language categorization. The strength of the color represents HOW predominant an language is within an area. If the symbol is a strong color, it makes up a larger portion of the population. This map is designed for a dark basemap such as the Human Geography Basemap or the Dark Gray Canvas Basemap. See the web map to see the pattern at both the county and tract level. This map helps to show the most common language spoken at home at both a regional and local level. The tract pattern shows how distinct neighborhoods are clustered by which language they speak. The county pattern shows how language is used throughout the country. This pattern is shown by census tracts at large scales, and counties at smaller scales.This data was downloaded from the United States Census Bureau American Fact Finder on January 16, 2018. It was then joined with 2016 vintage centroid points and hosted to ArcGIS Online and into the Living Atlas.Nationally, the breakdown of education for the population 5+ is as follows:Total EstimateMargin of ErrorPercent EstimateMargin of ErrorPopulation 5 years and over298,691,202+/-3,594(X)(X)Speak only English235,519,143+/-154,40978.90%+/-0.1Spanish39,145,066+/-94,57113.10%+/-0.1Asian and Pacific Island languages10,172,370+/-22,5613.40%+/-0.1Other Indo-European languages10,827,536+/-46,3353.60%+/-0.1Other languages3,027,087+/-23,3021.00%+/-0.1
American Airlines Panama Phone +808 470 7107 Want to know how to contact American Airlines Panama Phone +808 470 7107 to receive quick and clear help? You can easily do so by calling +808 470 7107, the official American Airlines Panama phone number available 24/7. Day or night, this number gives you direct access to an agent who can quickly resolve your issue. Do you have questions about your flight? Want to change the time or find out more about your bags? Then call now and speak to someone ready to help you.
The American Airlines Panama phone number +808 470 7107 is designed so that people like you, even children, can understand and receive assistance without complications. Just dial +808 470 7107 and someone will answer with all the information you need. If you don't know how to make a reservation online, simply dial the American Airlines Panama phone number +808 470 7107 and they will make it for you.
Sometimes websites can be confusing. But with American Airlines Panama phone numbers, you can rest easy. Just call +808 470 7107 and an agent will guide you step by step. Everything will be easy, and you can book your cheap flight now without worrying about making mistakes. The agents speak calmly, use simple terms, and are always ready to help.
If you ever miss a flight, have a flight delay, or have any confusion, the best thing to do is call American Airlines Panama right away. When you call +808 470 7107, you don't have to wait hours on the line. Help is available immediately, and you can speak directly to a real person.
Many people in Panama use the American Airlines Panama phone number +808 470 7107 every day. Whether you need to change seats, add bags, check flight status, or ask questions about passports, this number is the solution. +808 470 7107 is your best friend when you travel. Call from home, school, or wherever you are. The American Airlines Panama phone number always works.
American Airlines Panama's customer service is very friendly. You don't need to know much to be able to call. Even a child can understand. Just dial +808 470 7107 and ask whatever you want. Everything is explained step by step, without complications. It's like having a friend who knows a lot about flights and wants to help you.
The American Airlines Panama phone number is also perfect if you're looking for deals. If you want to save money on your next trip, book your cheap flight now by calling +808 470 7107. They'll tell you which flights are the cheapest and when you can travel. Also, if you have any special needs, you can tell the agent.
Anything you can imagine about flights can be resolved by calling American Airlines Panama. From knowing how much luggage you can carry to how to change a passenger's name, this number answers it all. +808 470 7107 is available every day, at any time, even on weekends.
And if you're at the airport and don't know what to do, just dial +808 470 7107 and someone from the American Airlines Panama team will help you. You won't feel alone or confused. This number is like having a personal travel guide always with you.
When you dial +808 470 7107, you're connecting directly to people who really know how to help you. The American Airlines Panama phone number is designed to make everything quick and easy. If your parents are busy or don't understand something about the flight, you can call and ask, as they'll explain everything clearly.
And if you already have a reservation but want to confirm it, you can also do so by calling +808 470 7107. The American Airlines Panama phone number will give you all the details of your flight, including the time, seat, and any changes. You won't have any doubts because you'll have all the information you need.
This number, +808 470 7107, is special because it gives you access to an entire team working for you. Call as many times as you like. If you don't understand something, you can call back and ask again. They're there for you, whether you're a child, an adult, or a grandparent. The American Airlines Panama phone number is for everyone.
Frequently asked questions about American Airlines Panama phone:
What can I do by calling American Airlines Panama? You can book, change flights, ask for help, or resolve any issues. Just call +808 470 7107.
Is the American Airlines Panama phone number available around the clock? Yes, the number +808 470 7107 is available 24/7. There's always someone there to help you.
Do they speak Spanish at American Airlines Panama? Yes, all the agents at +808 470 7107 speak clear Spanish and explain everything easily.
Can I make reservations by phone? Yes, just call American Airlines Panama and you can book your cheap flight now.
Is it quick to get help when you call? Yes, you can call without waiting and receive prompt assistance at +808 470 7107.
Can I call from anywhere in Panama? Yes, just dial +808 470 7107, which is the American Airlines Panama phone number, and they'll help you.
Is it safe to give my information when calling? Yes, the American Airlines Panama phone number is safe and reliable. You can trust them.
Can I cancel or change my flight over the phone? Yes, by calling +808 470 7107, they'll help you easily change or cancel your flight.
So now you know, if you ever need help with your flight, look no further. The American Airlines Panama phone number is here for you. Dial +808 470 7107 from wherever you are and get help now. You can call as many times as you need. No lines, no waiting, no hassle. Everything is quick and easy with the American Airlines Panama phone number.
Traveling is fun, and with +808 470 7107, it's also easy and safe. Call today, tomorrow, or whenever you need. There will always be someone who understands and helps you. Remember this number: +808 470 7107, because it's the only American Airlines Panama phone number that truly serves you and takes care of you every step of the way.
The Delaware Valley Regional Planning Commission (DVRPC) is committed to upholding the principles and intentions of the 1964 Civil Rights Act and related nondiscrimination statutes in all of the Commission’s work, including publications, products, communications, public input, and decision-making processes. Language barriers may prohibit people who are Limited in English Proficiency (also known as LEP persons) from obtaining services, information, or participating in public planning processes. To better identify LEP populations and thoroughly evaluate the Commission’s efforts to provide meaningful access, DVRPC has produced this Limited-English Proficiency Plan. This is the data that was used to make the maps for the upcoming plan. Census tables used to gather data from the 2019 - 2023 American Community Survey 5-Year Estimates ACS 2019-2023, Table C16001: Language Spoken at Home by Ability to Speak English for the Population 5 Years and Over. ACS data are derived from a survey and are subject to sampling variablity.
Vietnamese Source of tract boundaries: US Census Bureau. The TIGER/Line Files Please refer to U:_OngoingProjects\LEP\ACS_5YR_C16001_LEP_metadata.xlsx for full attribute look up and fields used in making the DVRPC LEP Map Series. Please contact Chris Pollard (cpollard@dvrpc.org) should you have any questions about this dataset.
The Equity Focus Area dataset identifies the census tracts that have concentrations of equity populations above the regional average in King, Kitsap, Pierce, and Snohomish counties. The 2011-2016 U.S. Census Bureau’s 5-year American Community Survey data was analyzed for this dataset. Equity focus populations include people of color, people with low incomes (below 200% of federal poverty level), youth (5-17), older adults (65+), people with disabilities, and people with limited English proficiency (who don`t speak English very well). This dataset was used for the 2022-2050 Regional Transportation Plan analysis.
In 2019, about 12.08 million children were speaking another language other than English at home in the United States. This number is fairly consistent with the previous year, where 12.13 million children spoke another language at home.
AbstractIntroduction CALLFRIEND American English-Southern Dialect Second Edition was developed by LDC and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Southern dialects of American English. This second edition updates the audio files to wav format, simplifies the directory structure and adds documentation and metadata. The first edition is available as CALLFRIEND American English-Southern Dialect (LDC96S47). The CALLFRIEND series is a collection of telephone conversations in several languages conducted by LDC in support of language identification technology development. Languages covered in the collection include American English, Canadian French, Egyptian Arabic, Farsi, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil and Vietnamese. Data All data was collected before July 1997. Participants could speak with a person of their choice on any topic; most called family members and friends. All calls originated in North America. The recorded conversations last up to 30 minutes. The data was recorded as 8kHz u-law SPH encoded stereo files, with one end of the phone call on each channel. In this release, files were converted to WAV format, and information from the original SPH headers is described in the documentation. SPH files are not included in this second edition. The audio files were originally split into train, dev and test folders of 20 recordings each, but they are combined in this release. Completed calls passed through a human auditing process to verify that the target language was spoken by the participants, to check the quality of the recordings, and to record information about dialect, noise and distortion.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recommended citation for this dataset: Garnett, Vicky, & Lucek, Stephen. (2020). The Dublin Language Garden Perceptual Dialectology of Irish English Collection (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4247829
About this Dataset The field of Perceptual Dialectology is an area of sociolinguistic study that investigates how non-linguists view different varieties of language. It often includes hand-drawn map exercises in which participants indicate where they believe various varieties are spoken, and their attitudes towards them.
In 2015, as part of a public linguistics outreach event (the Dublin Language Garden) held at Trinity College Dublin, the authors created an activity for members of the public and collected hand-drawn maps from them that gave responses to the following tasks:
a. Indicate where you come from on the map (using a red dot sticker) b. Draw where you think the Dublin dialect occurs c. Draw the boundaries of any other dialects you believe occur in Ireland d. Tell us what you think are the features of those dialects e. Tell us what you think are the characteristics of the people who speak those dialects.
Participants of all ages were encouraged to take part, but only data from those over 18 were retained after the event and used in this data collection. Participants were all given information on how the data was to be anonymised, processed and published on a clearly displayed poster to read before they were given a map to complete the 5 tasks (listed above). No additional information about the participants, aside from that acquired through Task a, was collected.
File List:
_READ_ME - Dublin Language Garden Perceptual Dialectology of Irish English data.txt Contains a detailed description of this dataset.
DLG_PDIE_KML_data_by_location.zip This zipped folder contains the .kml data of multiple hand-drawn maps organised into folders by their location
DLG_PDIE_KML_data_by_part.zip This zipped folder contains the .kml data of multiple hand-drawn maps organised into folders according to the participants.
These folders have been organised in this way in order to make discoverability easier between the data. Users may wish to analyse the data only by the locations of the varieties identified by the participants. Other users may only be interested in the data given by specific participants, and therefore the folder that organises the data in this way may be of better use to them. Both folders, however, contain the same data, it is simply how they are organised.
Garnett and Lucek DLG_PD_IE Qualitative Data (Nov 2020).xlsx Spreadsheet featuring tabulated qualitative data taken from all maps
Sample Hand-drawn Maps.zip Folder containing 2 sample hand-drawn maps from the participants to help contextualise the data presented here.
Any questions? Any enquiries regarding this dataset should be directed to either Vicky Garnett (garnetv@tcd.ie) or Stephen Lucek (stephen.lucek@ucd.ie).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Purpose: Best practices recommend promoting the use of the home language and allowing caregivers to choose the language(s) that they want to use with their child who is deaf or hard of hearing (DHH). We examined whether Spanish-speaking caregivers of children who are DHH receive professional recommendations on oral bilingualism that follow best practices. We also assessed whether professional recommendations, caregiver beliefs, and language practices had an impact on child language(s) proficiency. Method: Sixty caregivers completed a questionnaire on demographic questions, language(s) use and recommendations, beliefs on bilingualism, and child language proficiency measures in English, Spanish, and American Sign Language (ASL). Professional recommendations on oral bilingualism were reported descriptively, and linear regression was used to identify the predictors of child language(s) proficiency. Results: We found that only 23.3% of the caregivers were actively encouraged to raise their child orally bilingual. Language practices predicted child proficiency in each language (English, Spanish, and ASL), but professional recommendations and caregiver beliefs did not. Conclusions: Our results revealed that most caregivers received recommendations that do not follow current best practices. Professional training is still needed to promote bilingualism and increase cultural competence when providing services to caregivers who speak languages different from English. Supplemental Material S1. Survey items and response scoring. BenÃtez-Barrera, C., Reiss, L., Majid, M., Chau, T., Wilson, J., Rico, E. F., Bunta, F., Raphael, R. M., & de Diego-Lázaro, B. (2023). Caregiver experiences with oral bilingualism in children who are deaf and hard of hearing in the United States: Impact on child language proficiency. Language, Speech, and Hearing Services in Schools, 54(1), 224–240. https://doi.org/10.1044/2022_LSHSS-22-00095
CALLFRIEND American English-Non-Southern Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of non-Southern dialects of American English. This second edition updates the audio files to wav format, simplifies the directory structure and adds documentation and metadata. The first edition is available as CALLFRIEND American English-Non-Southern Dialect (LDC96S46). The CALLFRIEND series is a collection of telephone conversations in several languages conducted by LDC in support of language identification technology development. Languages covered in the collection include American English, Canadian French, Egyptian Arabic, Farsi, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil and Vietnamese. Data All data was collected before July 1997. Participants could speak with a person of their choice on any topic; most called family members and friends. All calls originated in North America. The recorded conversations last up to 30 minutes. The data was recorded as 8kHz u-law SPH encoded stereo files, with one end of the phone call on each channel. In this release, files were converted to WAV format, and information from the original SPH headers is described in the documentation. SPH files are not included in this second edition. The audio files were originally split into train, dev and test folders of 20 recordings each, but they are combined in this release. Completed calls passed through a human auditing process to verify that the target language was spoken by the participants, to check the quality of the recordings, and to record information about dialect, noise and distortion.
Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.