100+ datasets found

d
Population of the Limited English Proficient (LEP) Speakers by Community...
datasets.ai
data.cityofnewyork.us
+1more
23, 40, 55, 8
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of New York (2024). Population of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://datasets.ai/datasets/population-of-the-limited-english-proficient-lep-speakers-by-community-district
Explore at:
55, 23, 8, 40Available download formats
Dataset updated
Aug 6, 2024
Dataset authored and provided by
City of New York
Description
Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.
The most spoken languages worldwide 2025
statista.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Explore at:
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
Share of U.S. population speaking a language besides English at home 2023,...
statista.com
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of U.S. population speaking a language besides English at home 2023, by state [Dataset]. https://www.statista.com/statistics/312940/share-of-us-population-speaking-a-language-other-than-english-at-home-by-state/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.
F
American English General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple English speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for US English.

•
Voice Assistants: Build smart assistants capable of understanding natural American conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
Ranking of languages spoken at home in the U.S. 2023
statista.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
Explore at:
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.
Language Spoken at Home 2018-2022 - STATES
mce-data-uscensus.hub.arcgis.com
arc-gis-hub-home-arcgishub.hub.arcgis.com
+1more
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Census Bureau (2024). Language Spoken at Home 2018-2022 - STATES [Dataset]. https://mce-data-uscensus.hub.arcgis.com/maps/d89bebf3729d4540856fb3176c9d32f8
Explore at:
Dataset updated
Feb 5, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
US Census Bureau
Description
This layer shows Language Spoken at Home. This is shown by state and county boundaries. This service contains the 2018-2022 release of data from the American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. This layer is symbolized to show the percentage of households with Limited English Speaking Status. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2018-2022ACS Table(s): B16004, DP02, S1601, S1602Data downloaded from: CensusBureau's API for American Community Survey Date of API call: January 18, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:Boundaries come from the Cartographic Boundaries via US Census TIGER geodatabases. Boundaries are updated at the same time as the data updates, and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines clipped for cartographic purposes. For state and county boundaries, the water and coastlines are derived from the coastlines of the 500k TIGER Cartographic Boundary Shapefiles. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto Rico. The Counties (and equivalent) layer contains 3221 records - all counties and equivalent, Washington D.C., and Puerto Rico municipios. See Areas Published. Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells.Margin of error (MOE) values of -555555555 in the API (or "*****" (five asterisks) on data.census.gov) are displayed as 0 in this dataset. The estimates associated with these MOEs have been controlled to independent counts in the ACS weighting and have zero sampling error. So, the MOEs are effectively zeroes, and are treated as zeroes in MOE calculations. Other negative values on the API, such as -222222222, -666666666, -888888888, and -999999999, all represent estimates or MOEs that can't be calculated or can't be published, usually due to small sample sizes. All of these are rendered in this dataset as null (blank) values.
MCB_languages_county
kaggle.com
Updated Oct 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marisol Brewster (2019). MCB_languages_county [Dataset]. https://www.kaggle.com/mcbrewster/mcb-languages-county/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 1, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marisol Brewster
Description
Context

This is a dataset I found online through the Google Dataset Search portal.

Content

The American Community Survey (ACS) 2009-2013 multi-year data are used to list all languages spoken in the United States that were reported during the sample period. These tables provide detailed counts of many more languages than the 39 languages and language groups that are published annually as a part of the routine ACS data release. This is the second tabulation beyond 39 languages since ACS began.

The tables include all languages that were reported in each geography during the 2009 to 2013 sampling period. For the purpose of tabulation, reported languages are classified in one of 380 possible languages or language groups. Because the data are a sample of the total population, there may be languages spoken that are not reported, either because the ACS did not sample the households where those languages are spoken, or because the person filling out the survey did not report the language or reported another language instead.

The tables also provide information about self-reported English-speaking ability. Respondents who reported speaking a language other than English were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all." The data on ability to speak English represent the person’s own perception about his or her own ability or, because ACS questionnaires are usually completed by one household member, the responses may represent the perception of another household member.

These tables are also available through the Census Bureau's application programming interface (API). Please see the developers page for additional details on how to use the API to access these data.

Acknowledgements

Sources:

Google Dataset Search: https://toolbox.google.com/datasetsearch

2009-2013 American Community Survey

Original dataset: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html

Downloaded From: https://data.world/kvaughn/languages-county

Banner and thumbnail photo by Farzad Mohsenvand on Unsplash
D
Languages and English Ability - Seattle Neighborhoods
data.seattle.gov
gimi9.com
+3more
application/rdfxml +5
Updated Oct 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://data.seattle.gov/dataset/Languages-and-English-Ability-Seattle-Neighborhood/d2c7-tkpy
Explore at:
json, csv, tsv, xml, application/rssxml, application/rdfxmlAvailable download formats
Dataset updated
Oct 22, 2024
Area covered
Seattle
Description
Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.

Table created for and used in the Neighborhood Profiles application.

Vintages: 2023
ACS Table(s): B16004, C16002
Data downloaded from: Census Bureau's Explore Census Data

The United States Census Bureau's American Community Survey (ACS):
About the Survey
Geography & ACS
Technical Documentation
News & Updates
This ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.

Data Note from the Census:
Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.

Data Processing Notes:
Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb(year)a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters).
The States layer contains 52 records - all US states, Washington D.C., and Puerto Rico
Census tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).
Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications <a href='https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf' style='color:rgb(0, 121, 193); text-decoration-line:none; font-family:inherit;' target='_blank' rel='nofollow ugc
Number of native Spanish speakers worldwide 2024, by country
statista.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
Explore at:
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.
2013 American Community Survey - Table Packages: Detailed Language Spoken in...
catalog.data.gov
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
United States
Description
This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.
n
Data from: Language Spoken at Home
linc.osbm.nc.gov
ncosbm.opendatasoft.com
csv, excel, geojson +1
Updated Oct 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Language Spoken at Home [Dataset]. https://linc.osbm.nc.gov/explore/dataset/language-spoken-at-home/
Explore at:
geojson, csv, json, excelAvailable download formats
Dataset updated
Oct 3, 2024
Description
Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.
n
215 Hours - American English Speech Data by Mobile Phone_Reading
m.nexdata.ai
Updated May 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 215 Hours - American English Speech Data by Mobile Phone_Reading [Dataset]. https://m.nexdata.ai/datasets/speechrecog/78?source=Github
Explore at:
Dataset updated
May 4, 2024
Dataset provided by
Nexdata
nexdata technology inc
Authors
Nexdata
Area covered
United States
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
Description
English(the United States) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(349 speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
English Language Ability by Zip Code Tabulation Area Geographic Data
johnsnowlabs.com
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs, English Language Ability by Zip Code Tabulation Area Geographic Data [Dataset]. https://www.johnsnowlabs.com/marketplace/english-language-ability-by-zip-code-tabulation-area-geographic-data/
Explore at:
csvAvailable download formats
Dataset authored and provided by
John Snow Labs
Time period covered
Jan 1, 2010 - Dec 31, 2014
Area covered
United States
Description
This dataset identifies the language spoken at home by zip code tabulation areas within the United States. This dataset resulted from the American Community Survey (ACS) conducted from 2010 through 2014. The dataset also shows how well the language is spoken (well, very well, less than very well). JSL enriched this dataset with Latitude and Longitude information and with the map information about the land and water area of zip code tabulation areas.
Common languages used for web content 2025, by share of websites
statista.com
ai-chatbox.pro
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Explore at:
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
h
english_dialects
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
english_dialects [Dataset]. https://huggingface.co/datasets/ylacombe/english_dialects
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Yoach Lacombe
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for "english_dialects"

Dataset Summary

This dataset consists of 31 hours of transcribed high-quality audio of English sentences recorded by 120 volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The speakers self-identified as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English. The recording scripts… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/english_dialects.
O
2017 San Diego County Demographics - Language Spoken at Home for the...
data.sandiegocounty.gov
application/rdfxml +5
Updated Feb 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
County of San Diego (2020). 2017 San Diego County Demographics - Language Spoken at Home for the Population 5 Years and Ability to Speak English (Detailed) [Dataset]. https://data.sandiegocounty.gov/Demographics/2017-San-Diego-County-Demographics-Language-Spoken/b7iq-x9dz
Explore at:
csv, xml, application/rdfxml, application/rssxml, tsv, jsonAvailable download formats
Dataset updated
Feb 22, 2020
Dataset authored and provided by
County of San Diego
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
San Diego County
Description
Language questions were only asked of persons 5 years and older. The language question is about current use of a non-English language at home, not about ability to speak another language or the use of such a language in the past or elsewhere. People who speak a language other than English outside of the home are not reported as speaking a language other than English. Respondents that spoke a language other than English at home, where also asked whether they could speak English "very well" or less than "very well. See how the Census Bureau measures Language Use for more information at https://www.census.gov/topics/population/language-use/about.html.

Source: U.S. Census Bureau; 2013-2017 American Community Survey 5-Year Estimates, Table C16001.
English Conversation and Monologue speech dataset
kaggle.com
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Wong (2024). English Conversation and Monologue speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/english-real-world-speech-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Frank Wong
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
English(America) Real-world Casual Conversation and Monologue speech dataset

Description

English(America) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety-show, etc, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1115?source=Kaggle

Format

16kHz, 16 bit, wav, mono channel;

Content category

Including self-media, conversation, live, lecture, variety-show, etc;

Recording environment

Low background noise;

Country

America(USA);

Language(Region) Code

en-US;

Language

English;

Features of annotation

Transcription text, timestamp, speaker ID, gender.

Accuracy Rate

Sentence Accuracy Rate (SAR) 95%

Licensing Information

Commercial License
d
LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED...
catalog.data.gov
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle ArcGIS Online (2025). LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS (B16003) [Dataset]. https://catalog.data.gov/dataset/language-spoken-at-home-for-the-population-5-years-and-over-in-limited-english-speaking-ho
Explore at:
Dataset updated
Jan 31, 2025
Dataset provided by
City of Seattle ArcGIS Online
Description
Table from the American Community Survey (ACS) B16003 of age by language spoken at home for the population 5 years and over in limited English-speaking households. These are multiple, nonoverlapping vintages of the 5-year ACS estimates of population and housing attributes starting in 2010 shown by the corresponding census tract vintage. Also includes the most recent release annually.King County, Washington census tracts with nonoverlapping vintages of the 5-year American Community Survey (ACS) estimates starting in 2010. Vintage identified in the "ACS Vintage" field.The census tract boundaries match the vintage of the ACS data (currently 2010 and 2020) so please note the geographic changes between the decades. Tracts have been coded as being within the City of Seattle as well as assigned to neighborhood groups called "Community Reporting Areas". These areas were created after the 2000 census to provide geographically consistent neighborhoods through time for reporting U.S. Census Bureau data. This is not an attempt to identify neighborhood boundaries as defined by neighborhoods themselves.Vintages: 2010, 2015, 2020, 2021, 2022, 2023ACS Table(s): B16003Data downloaded from: <a href='https://data.c
344 People - English(the United States) Scripted Monologue Smartphone speech...
m.nexdata.ai
Updated Sep 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 344 People - English(the United States) Scripted Monologue Smartphone speech dataset_Guiding [Dataset]. https://m.nexdata.ai/datasets/speechrecog/79
Explore at:
Dataset updated
Sep 27, 2023
Dataset authored and provided by
Nexdata
Area covered
United States
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
Description
English(the United States) Scripted Monologue Smartphone speech dataset_Guiding, collected from monologue based on given prompts, covering smart car, smart home, voice assistant domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(344 speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
2015 American Community Survey: B16003 | AGE BY LANGUAGE SPOKEN AT HOME FOR...
data.census.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2015 American Community Survey: B16003 | AGE BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS (ACS 5-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT5Y2015.B16003?q=Language+Spoken+at+Home&g=160XX00US1901855&y=2015
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2015
Description
Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Tell us what you think. Provide feedback to help make American Community Survey data more useful for you..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2011-2015 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Methodological changes to data collection in 2013 may have affected language data for 2013. Users should be aware of these changes when using multi-year data containing data from 2013. For more information, see: Language User Note..A "limited English speaking household" is one in which no member 14 years old and over (1) speaks only English or (2) speaks a non-English language and speaks English "very well." In other words, all members 14 years old and over have at least some difficulty with English. By definition, English-only households cannot belong to this group. Previous Census Bureau data products have referred to these households as "linguistically isolated" and "Households in which no one 14 and over speaks English only or speaks a language other than English at home and speaks English 'very well'." This table is directly comparable to tables from earlier years that used these labels..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2011-2015 American Community Survey 5-Year Estimates

Facebook

Twitter

Click to copy link

Link copied

Cite

City of New York (2024). Population of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://datasets.ai/datasets/population-of-the-limited-english-proficient-lep-speakers-by-community-district

Population of the Limited English Proficient (LEP) Speakers by Community District

Explore at:

55, 23, 8, 40Available download formats

Dataset updated

Aug 6, 2024

Dataset authored and provided by

City of New York

Description

Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.

Clear search

Close search

Google apps

Main menu

Population of the Limited English Proficient (LEP) Speakers by Community...

The most spoken languages worldwide 2025

Share of U.S. population speaking a language besides English at home 2023,...

American English General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Ranking of languages spoken at home in the U.S. 2023

Language Spoken at Home 2018-2022 - STATES

MCB_languages_county

Context

Content

Acknowledgements

Languages and English Ability - Seattle Neighborhoods

Number of native Spanish speakers worldwide 2024, by country

2013 American Community Survey - Table Packages: Detailed Language Spoken in...

Data from: Language Spoken at Home

215 Hours - American English Speech Data by Mobile Phone_Reading

English Language Ability by Zip Code Tabulation Area Geographic Data

Common languages used for web content 2025, by share of websites

english_dialects

2017 San Diego County Demographics - Language Spoken at Home for the...

English Conversation and Monologue speech dataset

English(America) Real-world Casual Conversation and Monologue speech dataset

Description

Format

Content category

Recording environment

Country

Language(Region) Code

Language

Features of annotation

Accuracy Rate

Licensing Information

LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED...

344 People - English(the United States) Scripted Monologue Smartphone speech...

2015 American Community Survey: B16003 | AGE BY LANGUAGE SPOKEN AT HOME FOR...

Population of the Limited English Proficient (LEP) Speakers by Community DistrictSee More Versions

Population of the Limited English Proficient (LEP) Speakers by Community District