100+ datasets found
  1. Percent of Population with Limited Ability to Speak English

    • data.amerigeoss.org
    • coronavirus-resources.esri.com
    • +2more
    esri rest, html
    Updated Jul 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ESRI (2019). Percent of Population with Limited Ability to Speak English [Dataset]. https://data.amerigeoss.org/dataset/percent-of-population-with-limited-ability-to-speak-english
    Explore at:
    esri rest, htmlAvailable download formats
    Dataset updated
    Jul 24, 2019
    Dataset provided by
    Esrihttp://esri.com/
    Description
    This map shows the percent of population with a limited ability to speak English by census tract. Search to your community and investigate the top language needs in nearby census tracts.


    *DATA AS OF 2011-2015*
    Data Source: U.S. Census Bureau's American Community Survey 5-year estimates, 2011-2015, Table B16001.

    Complete list of all languages available in this data set (29):
    Spanish or Spanish Creole; French (including Patois, Cajun); French Creole; Italian; Portuguese; German; Yiddish; Greek; Russian; Polish; Serbo-Croatian; Armenian; Persian; Gujarati; Hindi; Urdu; Chinese; Japanese; Korean; Mon-Khmer, Cambodian; Hmong; Thai; Laotian; Vietnamese; Tagalog; Navajo; Hungarian; Arabic; Hebrew. Those who have limited English ability and speak other languages are included in the percentage depicted in the map, but other languages will not appear in the ranked list or in the table.

    Accompanying feature layer and viewing app are also available.
  2. N

    Population and Languages of the Limited English Proficient (LEP) Speakers by...

    • data.cityofnewyork.us
    • catalog.data.gov
    application/rdfxml +5
    Updated Apr 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Civic Engagement Commission (CEC) (2022). Population and Languages of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://data.cityofnewyork.us/City-Government/Population-and-Languages-of-the-Limited-English-Pr/ajin-gkbp
    Explore at:
    application/rssxml, xml, csv, tsv, application/rdfxml, jsonAvailable download formats
    Dataset updated
    Apr 25, 2022
    Dataset authored and provided by
    Civic Engagement Commission (CEC)
    Description

    Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.

  3. 2013 American Community Survey - Table Packages: Detailed Language Spoken in...

    • catalog.data.gov
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    United States
    Description

    This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.

  4. MCB_languages_county

    • kaggle.com
    Updated Oct 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marisol Brewster (2019). MCB_languages_county [Dataset]. https://www.kaggle.com/mcbrewster/mcb-languages-county/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marisol Brewster
    Description

    Context

    This is a dataset I found online through the Google Dataset Search portal.

    Content

    The American Community Survey (ACS) 2009-2013 multi-year data are used to list all languages spoken in the United States that were reported during the sample period. These tables provide detailed counts of many more languages than the 39 languages and language groups that are published annually as a part of the routine ACS data release. This is the second tabulation beyond 39 languages since ACS began.

    The tables include all languages that were reported in each geography during the 2009 to 2013 sampling period. For the purpose of tabulation, reported languages are classified in one of 380 possible languages or language groups. Because the data are a sample of the total population, there may be languages spoken that are not reported, either because the ACS did not sample the households where those languages are spoken, or because the person filling out the survey did not report the language or reported another language instead.

    The tables also provide information about self-reported English-speaking ability. Respondents who reported speaking a language other than English were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all." The data on ability to speak English represent the person’s own perception about his or her own ability or, because ACS questionnaires are usually completed by one household member, the responses may represent the perception of another household member.

    These tables are also available through the Census Bureau's application programming interface (API). Please see the developers page for additional details on how to use the API to access these data.

    Acknowledgements

    Sources:

    Google Dataset Search: https://toolbox.google.com/datasetsearch

    2009-2013 American Community Survey

    Original dataset: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html

    Downloaded From: https://data.world/kvaughn/languages-county

    Banner and thumbnail photo by Farzad Mohsenvand on Unsplash

  5. Share of U.S. population speaking a language besides English at home 2023,...

    • statista.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of U.S. population speaking a language besides English at home 2023, by state [Dataset]. https://www.statista.com/statistics/312940/share-of-us-population-speaking-a-language-other-than-english-at-home-by-state/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.

  6. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  7. n

    Data from: Language Spoken at Home

    • linc.osbm.nc.gov
    • ncosbm.opendatasoft.com
    csv, excel, geojson +1
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Language Spoken at Home [Dataset]. https://linc.osbm.nc.gov/explore/dataset/language-spoken-at-home/
    Explore at:
    geojson, csv, json, excelAvailable download formats
    Dataset updated
    Oct 3, 2024
    Description

    Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.

  8. Ranking of languages spoken at home in the U.S. 2023

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  9. N

    Population of the Limited English Proficient (LEP) Speakers by Community...

    • data.cityofnewyork.us
    • datasets.ai
    • +1more
    application/rdfxml +5
    Updated Apr 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Civic Engagement Commission (CEC) (2022). Population of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://data.cityofnewyork.us/City-Government/Population-of-the-Limited-English-Proficient-LEP-S/9ji4-nien
    Explore at:
    csv, tsv, application/rdfxml, application/rssxml, xml, jsonAvailable download formats
    Dataset updated
    Apr 25, 2022
    Dataset authored and provided by
    Civic Engagement Commission (CEC)
    Description

    Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.

  10. a

    American and Alaska Native population that speaks English and Native...

    • catalog.epscor.alaska.edu
    Updated Dec 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). American and Alaska Native population that speaks English and Native language 2010-2014 [Dataset]. https://catalog.epscor.alaska.edu/dataset/american-and-alaska-native-population-that-speaks-english-and-native-language-2010-2014
    Explore at:
    Dataset updated
    Dec 17, 2019
    Area covered
    Alaska
    Description

    This data was made as part of the Alaska Experimental Program to Stimulate Competitive Research (EPSCoR) Northern Test Case. The data can be used to look at language skills and retention over time. This data is the percent of the American and Alaska Native population that speaks only Other. Other languages include: Navajo, Other Native American languages, Hungarian, Arabic, Hebrew, African languages, All other languages. We chose only Natives because our interest is Alaska Natives. However, data for places like Anchorage might have a large other Native presence which should be examined. Source: American Community Survey (ACS) Extent: Data is for all communities in Alaska. Notes: We chose only Natives because our interest is Alaska Natives. However, data for places like Anchorage might have a large other Native presence which should be examined.

  11. V

    Virginia Population by Language Spoken at Home by Ability to Speak English...

    • data.virginia.gov
    csv
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of INTERMODAL Planning and Investment (2025). Virginia Population by Language Spoken at Home by Ability to Speak English by Census Block Group (ACS 5-Year) [Dataset]. https://data.virginia.gov/dataset/virginia-population-by-language-spoken-at-home-by-ability-to-speak-english-by-census-block-group
    Explore at:
    csv(28410756)Available download formats
    Dataset updated
    Jan 3, 2025
    Dataset authored and provided by
    Office of INTERMODAL Planning and Investment
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    Virginia
    Description

    2013-2023 Virginia Population by Age by Language Spoken at Home by Ability to Speak English for the Population 5 years and over by Census Block Group. Contains estimates and margins of error.

    U.S. Census Bureau; American Community Survey, American Community Survey 5-Year Estimates, Table B16004 Data accessed from: Census Bureau's API for American Community Survey (https://www.census.gov/data/developers/data-sets.html)

    The United States Census Bureau's American Community Survey (ACS): -What is the American Community Survey? (https://www.census.gov/programs-surveys/acs/about.html) -Geography & ACS (https://www.census.gov/programs-surveys/acs/geography-acs.html) -Technical Documentation (https://www.census.gov/programs-surveys/acs/technical-documentation.html)

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section. (https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html)

    Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section. (https://www.census.gov/acs/www/methodology/sample_size_and_data_quality/)

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties.

    Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation https://www.census.gov/programs-surveys/acs/technical-documentation.html). The effect of nonsampling error is not represented in these tables.

  12. F

    American English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for US English.
    Voice Assistants: Build smart assistants capable of understanding natural American conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  13. d

    LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED...

    • catalog.data.gov
    Updated Jan 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2025). LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS (B16003) [Dataset]. https://catalog.data.gov/dataset/language-spoken-at-home-for-the-population-5-years-and-over-in-limited-english-speaking-ho
    Explore at:
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    City of Seattle ArcGIS Online
    Description

    Table from the American Community Survey (ACS) B16003 of age by language spoken at home for the population 5 years and over in limited English-speaking households. These are multiple, nonoverlapping vintages of the 5-year ACS estimates of population and housing attributes starting in 2010 shown by the corresponding census tract vintage. Also includes the most recent release annually.King County, Washington census tracts with nonoverlapping vintages of the 5-year American Community Survey (ACS) estimates starting in 2010. Vintage identified in the "ACS Vintage" field.The census tract boundaries match the vintage of the ACS data (currently 2010 and 2020) so please note the geographic changes between the decades. Tracts have been coded as being within the City of Seattle as well as assigned to neighborhood groups called "Community Reporting Areas". These areas were created after the 2000 census to provide geographically consistent neighborhoods through time for reporting U.S. Census Bureau data. This is not an attempt to identify neighborhood boundaries as defined by neighborhoods themselves.Vintages: 2010, 2015, 2020, 2021, 2022, 2023ACS Table(s): B16003Data downloaded from: <a href='https://data.c

  14. O

    2017 San Diego County Demographics - Language Spoken at Home for the...

    • data.sandiegocounty.gov
    application/rdfxml +5
    Updated Feb 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of San Diego (2020). 2017 San Diego County Demographics - Language Spoken at Home for the Population 5 Years and Ability to Speak English (Detailed) [Dataset]. https://data.sandiegocounty.gov/Demographics/2017-San-Diego-County-Demographics-Language-Spoken/b7iq-x9dz
    Explore at:
    csv, xml, application/rdfxml, application/rssxml, tsv, jsonAvailable download formats
    Dataset updated
    Feb 22, 2020
    Dataset authored and provided by
    County of San Diego
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    San Diego County
    Description

    Language questions were only asked of persons 5 years and older. The language question is about current use of a non-English language at home, not about ability to speak another language or the use of such a language in the past or elsewhere. People who speak a language other than English outside of the home are not reported as speaking a language other than English. Respondents that spoke a language other than English at home, where also asked whether they could speak English "very well" or less than "very well. See how the Census Bureau measures Language Use for more information at https://www.census.gov/topics/population/language-use/about.html.

    Source: U.S. Census Bureau; 2013-2017 American Community Survey 5-Year Estimates, Table C16001.

  15. D

    Languages and English Ability - Seattle Neighborhoods

    • data.seattle.gov
    • gimi9.com
    • +3more
    application/rdfxml +5
    Updated Oct 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://data.seattle.gov/dataset/Languages-and-English-Ability-Seattle-Neighborhood/d2c7-tkpy
    Explore at:
    json, csv, tsv, xml, application/rssxml, application/rdfxmlAvailable download formats
    Dataset updated
    Oct 22, 2024
    Area covered
    Seattle
    Description

    Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.


    Table created for and used in the Neighborhood Profiles application.

    Vintages: 2023
    ACS Table(s): B16004, C16002


    The United States Census Bureau's American Community Survey (ACS):
    This ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.

    Data Note from the Census:
    Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.

    Data Processing Notes:
    • Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb(year)a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters).
    • The States layer contains 52 records - all US states, Washington D.C., and Puerto Rico
    • Census tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).
    • Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications <a href='https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf' style='color:rgb(0, 121, 193); text-decoration-line:none; font-family:inherit;' target='_blank' rel='nofollow ugc

  16. F

    Audio Visual Speech Dataset: American English

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

    Dataset Content

    This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

    Participant Diversity:
    Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.
    Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Video Data

    While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

    Recording Details:
    File Duration: Average duration of 30 seconds to 3 minutes per video.
    Formats: Videos are available in MP4 or MOV format.
    Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.
    Device: Both the latest Android and iOS devices are used in this collection.
    Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:
    Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.
    Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.
    Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.
    Face Orientation: Contains straight face and tilted face angles.
    Participant Positions: Records participants in both standing and seated positions.
    Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.
    Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.
    Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.
    Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:
    Happy
    Sad
    Excited
    Angry
    Annoyed
    Normal
    Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

    Metadata

    The dataset provides comprehensive metadata for each video recording and participant:

  17. E

    Latin American Speaking English Speech Data by Mobile Phone - 117 Hours

    • catalog.elra.info
    Updated Oct 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2022). Latin American Speaking English Speech Data by Mobile Phone - 117 Hours [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-S0430/
    Explore at:
    Dataset updated
    Oct 6, 2022
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    281 Latin American people recorded in a relatively quiet environment in authentic English. The recorded script is designed by linguists and covers a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. Format:16kHz, 16bit, uncompressed wav, mono channel.Recording environment:quiet indoor environment, low background noise, without echo.Recording content (read speech):generic category; human-machine interaction category; smart home command and control category; in-car command and control category; numbers.Demographics:281 speakers totally, with 48% males and 52% females; and 56% speakers of all are in the age group of 16-25,42% speakers of all are in the age group of 26-45, 2% speakers of all are in the age group of 46-60.Device:Android mobile phone, iPhone.Language:EnglishApplication scenarios:speech recognition; voiceprint recognition.

  18. ACS English Ability and Linguistic Isolation Variables - Boundaries

    • covid-hub.gio.georgia.gov
    • hub.arcgis.com
    • +4more
    Updated Nov 14, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2019). ACS English Ability and Linguistic Isolation Variables - Boundaries [Dataset]. https://covid-hub.gio.georgia.gov/maps/0c4d1027de6b4d6eb896d95f1240e1aa
    Explore at:
    Dataset updated
    Nov 14, 2019
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    This layer shows English ability and linguistic isolation by age group. This is shown by tract, county, and state boundaries. This service is updated annually to contain the most currently released American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. Linguistically isolated households are households in which no one 14 and over speak English only or speaks a language other than English at home and speaks English very well. This layer is symbolized to show the percent of adult (18+) population who have limited English ability. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2019-2023ACS Table(s): B16003, B16004 (Not all lines of ACS table B16004 are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community Survey Date of API call: December 12, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer is updated automatically when the most current vintage of ACS data is released each year, usually in December. The layer always contains the latest available ACS 5-year estimates. It is updated annually within days of the Census Bureau's release schedule. Click here to learn more about ACS data releases.Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2023 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters).The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.

  19. O

    2017 San Diego County Demographics - Language Spoken at Home for the...

    • data.sandiegocounty.gov
    application/rdfxml +5
    Updated Feb 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of San Diego (2020). 2017 San Diego County Demographics - Language Spoken at Home for the Population 5 Years and Ability to Speak English [Dataset]. https://data.sandiegocounty.gov/Demographics/2017-San-Diego-County-Demographics-Language-Spoken/69ct-7r4j
    Explore at:
    application/rssxml, csv, xml, application/rdfxml, json, tsvAvailable download formats
    Dataset updated
    Feb 22, 2020
    Dataset authored and provided by
    County of San Diego
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    San Diego County
    Description

    *Asian/Pacific Islander

    Language questions were only asked of persons 5 years and older. The language question is about current use of a non-English language at home, not about ability to speak another language or the use of such a language in the past. People who speak a language other than English outside of the home are not reported as speaking a language other than English. Similarly, people whose mother tongue is a non-English language but who do not currently use the language at home do not report the language.

    Source: U.S. Census Bureau; 2013-2017 American Community Survey 5-Year Estimates, Table DP02.

  20. l

    Census 21 - English proficiency MSOA

    • data.leicester.gov.uk
    csv, excel, geojson +1
    Updated Aug 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Census 21 - English proficiency MSOA [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-21-english-proficiency-msoa/
    Explore at:
    csv, geojson, excel, jsonAvailable download formats
    Dataset updated
    Aug 22, 2023
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for all MSOAs and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsProficiency in EnglishThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.Definition: How well people whose main language is not English (English or Welsh in Wales) speak English.This dataset provides details for the MSOAs of Leicester city.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ESRI (2019). Percent of Population with Limited Ability to Speak English [Dataset]. https://data.amerigeoss.org/dataset/percent-of-population-with-limited-ability-to-speak-english
Organization logo

Percent of Population with Limited Ability to Speak English

Explore at:
esri rest, htmlAvailable download formats
Dataset updated
Jul 24, 2019
Dataset provided by
Esrihttp://esri.com/
Description
This map shows the percent of population with a limited ability to speak English by census tract. Search to your community and investigate the top language needs in nearby census tracts.


*DATA AS OF 2011-2015*
Data Source: U.S. Census Bureau's American Community Survey 5-year estimates, 2011-2015, Table B16001.

Complete list of all languages available in this data set (29):
Spanish or Spanish Creole; French (including Patois, Cajun); French Creole; Italian; Portuguese; German; Yiddish; Greek; Russian; Polish; Serbo-Croatian; Armenian; Persian; Gujarati; Hindi; Urdu; Chinese; Japanese; Korean; Mon-Khmer, Cambodian; Hmong; Thai; Laotian; Vietnamese; Tagalog; Navajo; Hungarian; Arabic; Hebrew. Those who have limited English ability and speak other languages are included in the percentage depicted in the map, but other languages will not appear in the ranked list or in the table.

Accompanying feature layer and viewing app are also available.
Search
Clear search
Close search
Google apps
Main menu