21 datasets found

Top Languages Spoken in the United States
kaggle.com
Updated Oct 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Top Languages Spoken in the United States [Dataset]. https://www.kaggle.com/datasets/thedevastator/top-languages-spoken-in-the-united-states/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
United States
Description
Top Languages Spoken in the United States

The Impact of linguistics on Community and Business in America

About this dataset

Languages are an important part of daily life in the USA. Here is a table that shows the most common languages spoken in the USA, as well as a big spreadsheet which shows each CBSA (Core-Based Statistical Area, or urban area).

Language usage varies widely throughout the United States. According to the latest census data, over 350 different languages are represented in homes across the country. The following table and spreadsheet provide more detailed information on language usage throughout the various states and cities in the US:

Columns: - index: Index column for dataframe - Table with column headers in row 5 and row headers in column A: Contains language data for each CBSA (Core Based Statistical Area) - Unnamed: 1: Rank of CBSA by total number of speakers of all languages - Unnamed: 2: Name of CBSA - Unnamed: 3: Population of CBSA - Unnamed: 4: Percent of population that speaks English very well - Unnamed: 5 through Unnamed: 58 : Languages spoken by at least 0.1% of the population, with corresponding percentages

How to use the dataset

This dataset can be used to understand the linguistic diversity of the United States, and to compare languages spoken across different states and cities.

This data can also be used to explore trends in language usage over time.

businesses can use this dataset to identify which languages are most commonly spoken in the areas in which they operate and tailor their marketing or customer service accordingly.

Schools could use this dataset to plan language-learning programs based on the needs of their community.

Policymakers could use this data to better understand linguistic diversity in the United States and design programs to support bilingualism or multilingualism

Research Ideas

Businesses can use this dataset to identify which languages are most commonly spoken in the areas in which they operate and cater their marketing or customer service accordingly.

Schools could use this data to plan language-learning programs based on the needs of their community.

Policymakers could use this dataset to better understand linguistic diversity in the United States and design programs to support bilingualism or multilingualism

Acknowledgements

This dataset was created by Gary Hoover. The data was sourced from https://www.kaggle.com/garyhoov/us-languages

License

Unknown License - Please check the dataset description for more information.

Columns

File: Languages Spoken at Home by Urban Area = CBSA.csv

File: US Languages Spoken at Home 2014.csv | Column name | Description | |:-------------------------------------------------------------------|:--------------| | Table with column headers in row 5 and row headers in column A | |
n
Data from: Language Spoken at Home
linc.osbm.nc.gov
ncosbm.opendatasoft.com
csv, excel, geojson +1
Updated Oct 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Language Spoken at Home [Dataset]. https://linc.osbm.nc.gov/explore/dataset/language-spoken-at-home/
Explore at:
geojson, csv, json, excelAvailable download formats
Dataset updated
Oct 3, 2024
Description
Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.
N
Population and Languages of the Limited English Proficient (LEP) Speakers by...
data.cityofnewyork.us
catalog.data.gov
application/rdfxml +5
Updated Apr 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Civic Engagement Commission (CEC) (2022). Population and Languages of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://data.cityofnewyork.us/City-Government/Population-and-Languages-of-the-Limited-English-Pr/ajin-gkbp
Explore at:
application/rssxml, xml, csv, tsv, application/rdfxml, jsonAvailable download formats
Dataset updated
Apr 25, 2022
Dataset authored and provided by
Civic Engagement Commission (CEC)
Description
Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.
F
Audio Visual Speech Dataset: American English
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
Dataset Content
This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
•Participant Diversity:
•
Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.

•
Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

Video Data
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
•Recording Details:
•
File Duration: Average duration of 30 seconds to 3 minutes per video.

•
Formats: Videos are available in MP4 or MOV format.

•
Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.

•
Device: Both the latest Android and iOS devices are used in this collection.

•
Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:

•
Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.

•
Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.

•
Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.

•
Face Orientation: Contains straight face and tilted face angles.

•
Participant Positions: Records participants in both standing and seated positions.

•
Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.

•
Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.

•
Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.

•
Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:

•Happy
•Sad
•Excited
•Angry
•Annoyed
•Normal
•
Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

Metadata
The dataset provides comprehensive metadata for each video recording and participant:
•
h
english_dialects
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoach Lacombe, english_dialects [Dataset]. https://huggingface.co/datasets/ylacombe/english_dialects
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Yoach Lacombe
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for "english_dialects"

Dataset Summary

This dataset consists of 31 hours of transcribed high-quality audio of English sentences recorded by 120 volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The speakers self-identified as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English. The recording scripts… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/english_dialects.
n
117 Hours - Latin American Speaking English Speech Data by Mobile Phone
nexdata.ai
m.nexdata.ai
Updated Feb 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 117 Hours - Latin American Speaking English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1021
Explore at:
Dataset updated
Feb 2, 2024
Dataset provided by
nexdata technology inc
Nexdata
Authors
Nexdata
Area covered
Latin America
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
English(Latin America) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(281 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
h
peoples_speech
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MLCommons, peoples_speech [Dataset]. https://huggingface.co/datasets/MLCommons/peoples_speech
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
MLCommons
License
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Description
Dataset Card for People's Speech

Dataset Summary

The People's Speech Dataset is among the world's largest English speech recognition corpus today that is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0. It includes 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially is available with a permissive license.

Supported Tasks… See the full description on the dataset page: https://huggingface.co/datasets/MLCommons/peoples_speech.
a
PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 - Dataset -...
data.aurin.org.au
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 - Dataset - AURIN [Dataset]. https://data.aurin.org.au/dataset/tua-phidu-phidu-birthplace-nes-residents-phn-2016-phn2017
Explore at:
Dataset updated
Mar 6, 2025
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
This dataset, released August 2017, contains the Australian residents population by their birthplace divided into English speaking (ES) and non-English speaking (NES) countries, 2016. The following countries are designated as ES: Canada, Ireland, New Zealand, South Africa, United Kingdom and the United States of America; the remaining countries are designated as NES. The dataset also includes the population people born overseas and report poor proficiency in English. The data is by Primary Health Network (PHN) 2017 geographic boundaries based on the 2016 Australian Statistical Geography Standard (ASGS). There are 31 PHNs set up by the Australian Government. Each network is controlled by a board of medical professionals and advised by a clinical council and community advisory committee. The boundaries of the PHNs closely align with the Local Hospital Networks where possible. For more information please see the data source notes on the data. Source: Compiled by PHIDU based on the ABS Census of Population and Housing, August 2016. AURIN has spatially enabled the original data. Data that was not shown/not applicable/not published/not available for the specific area ('#', '..', '^', 'np, 'n.a.', 'n.y.a.' in original PHIDU data) was removed.It has been replaced by by Blank cells. For other keys and abbreviations refer to PHIDU Keys.
Census of Population and Housing, 2000 [United States]: Summary File 4, Iowa...
search.gesis.org
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Commerce. Bureau of the Census (2021). Census of Population and Housing, 2000 [United States]: Summary File 4, Iowa - Version 1 [Dataset]. http://doi.org/10.3886/ICPSR13527.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR13527.v1
Dataset updated
Feb 16, 2021
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
GESIS search
Authors
United States Department of Commerce. Bureau of the Census
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de457443https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de457443
Area covered
Iowa, United States
Description
Abstract (en): Summary File 4 (SF 4) from the United States 2000 Census contains the sample data, which is the information compiled from the questions asked of a sample of all people and housing units. Population items include basic population totals: urban and rural, households and families, marital status, grandparents as caregivers, language and ability to speak English, ancestry, place of birth, citizenship status, year of entry, migration, place of work, journey to work (commuting), school enrollment and educational attainment, veteran status, disability, employment status, industry, occupation, class of worker, income, and poverty status. Housing items include basic housing totals: urban and rural, number of rooms, number of bedrooms, year moved into unit, household size and occupants per room, units in structure, year structure built, heating fuel, telephone service, plumbing and kitchen facilities, vehicles available, value of home, monthly rent, and shelter costs. In Summary File 4, the sample data are presented in 213 population tables (matrices) and 110 housing tables, identified with "PCT" and "HCT" respectively. Each table is iterated for 336 population groups: the total population, 132 race groups, 78 American Indian and Alaska Native tribe categories (reflecting 39 individual tribes), 39 Hispanic or Latino groups, and 86 ancestry groups. The presentation of SF4 tables for any of the 336 population groups is subject to a population threshold. That is, if there are fewer than 100 people (100-percent count) in a specific population group in a specific geographic area, and there are fewer than 50 unweighted cases, their population and housing characteristics data are not available for that geographic area in SF4. For the ancestry iterations, only the 50 unweighted cases test can be performed. See Appendix H: Characteristic Iterations, for a complete list of characteristic iterations. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.. All persons in housing units in Iowa in 2000. 2013-05-25 Multiple Census data file segments were repackaged for distribution into a single zip archive per dataset. No changes were made to the data or documentation.2006-01-12 All files were removed from dataset 342 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 341 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 340 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 339 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 338 and flagged as study-level files, so that they will accompany all downloads. Because of the number of files per state in Summary File 4, ICPSR has given each state its own ICPSR study number in the range ICPSR 13512-13563. The study number for the national file is 13570. Data for each state are being released as they become available.The data are provided in 38 segments (files) per iteration. These segments are PCT1-PCT4, PCT5-PCT16, PCT17-PCT34, PCT35-PCT37, PCT38-PCT45, PCT46-PCT49, PCT50-PCT61, PCT62-PCT67, PCT68-PCT71, PCT72-PCT76, PCT77-PCT78, PCT79-PCT81, PCT82-PCT84, PCT85-PCT86 (partial), PCT86 (partial), PCT87-PCT103, PCT104-PCT120, PCT121-PCT131, PCT132-PCT137, PCT138-PCT143, PCT144, PCT145-PCT150, PCT151-PCT156, PCT157-PCT162, PCT163-PCT208, PCT209-PCT213, HCT1-HCT9, HCT10-HCT18, HCT19-HCT22, HCT23-HCT25, HCT26-HCT29, HCT30-HCT39, HCT40-HCT55, HCT56-HCT61, HCT62-HCT70, HCT71-HCT81, HCT82-HCT86, and HCT87-HCT110. The iterations are Parts 1-336, the Geographic Header File is Part 337. The Geographic Header File is in fixed-format ASCII and the table files are in comma-delimited ASCII format. A merged iteration will have 7,963 variables.For Parts 251-336, the part names contain numbers within parentheses that refer to the Ancestry Code List (page G1 of the codebook).
A
Hispanic-English Database
abacus.library.ubc.ca
iso, txt
Updated Nov 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (2022). Hispanic-English Database [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=719087385d798ea8ac94b0e3997f?persistentId=hdl%3A11272.1%2FAB2%2FIIJZCH&version=&q=&fileTypeGroupFacet=%22Text%22&fileAccess=
Explore at:
txt(1308), iso(3087785984)Available download formats
Dataset updated
Nov 30, 2022
Dataset provided by
Abacus Data Network
Description
AbstractIntroduction Hispanic-English Database contains approximately 30 hours of English and Spanish conversational and read speech with transcripts (24 hours) and metadata collected from 22 non-native English speakers between 1996 and 1998. The corpus was developed by Entropic Research Laboratory, Inc., a developer of speech recognition and speech synthesis software toolkits that was acquired by Microsoft in 1999. Participants were adult native speakers of Spanish as spoken in Central America and South America who resided in the Palo Alto, California area, had lived in the United States for at least one year and demonstrated a basic ability to understand, read and speak English. They read a total of 2200 sentences, 50 each in Spanish and English per speaker. The Spanish sentence prompts were a subset of the materials in LATINO-40 Spanish Read News, and the English sentence prompts were taken from the TIMIT database. Conversations were task-oriented, drawing on exercises similar to those used in English second language instruction and designed to engage the speakers in collaborative, problem-solving activities. Data Read speech was recorded on two wideband channels with a Shure SM10A head-mounted microphone in a quiet laboratory environment. The conversational speech was simultaneously recorded on four channels, two of which were used to place phone calls to each subject in two separate offices and to record the incoming speech of the two channels into separate files. The audio was originally saved under the Entropic Audio (ESPS) format using a 16kHz sampling rate and 16 bit samples. Audio files were converted to flac compressed .wav files from the ESPS format. ESPS headers were removed and are presented in this release as *.hdr files that include demographic and technical data. Transcripts were developed with the Entropic Annotator tool and are time-aligned with speaker turns. The transcription conventions were based on those used in the LDC Switchboard and CALLHOME collections. Transcript files are denoted with a .lab extension. Data files and their corresponding label files are stored in subdirectories named using a speaker-pair id and session number. The first three letters identify the speaker on channel A. The last three letters identify the speaker on channel B. Wideband audio files contain *.wb.flac in their file name, and narrow band audio files are denoted with a *.nb.flac in the file name.
A
R2 & NE: State Level 2006-2010 ACS Language Summary
data.amerigeoss.org
datadiscoverystudio.org
tgrshp (compressed)
Updated Jul 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States[old] (2019). R2 & NE: State Level 2006-2010 ACS Language Summary [Dataset]. https://data.amerigeoss.org/ko_KR/dataset/r2-ne-state-level-2006-2010-acs-language-summary
Explore at:
tgrshp (compressed)Available download formats
Dataset updated
Jul 30, 2019
Dataset provided by
United States[old]
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
Nebraska
Description
The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. States and equivalent entities are the primary governmental divisions of the United States. In addition to the fifty States, the Census Bureau treats the District of Columbia, Puerto Rico, and each of the Island Areas (American Samoa, the Commonwealth of the Northern Mariana Islands, Guam, and the U.S. Virgin Islands) as the statistical equivalents of States for the purpose of data presentation.

This table contains data on language ability and linguistic isolation from the American Community Survey 2006-2010 database for states. Linguistic isolation is defined as no one 14 and over speaks English only or speaks English "very well". The American Community Survey (ACS) is a household survey conducted by the U.S. Census Bureau that currently has an annual sample size of about 3.5 million addresses. ACS estimates provides communities with the current information they need to plan investments and services. Information from the survey generates estimates that help determine how more than $400 billion in federal and state funds are distributed annually. Each year the survey produces data that cover the periods of 1-year, 3-year, and 5-year estimates for geographic areas in the United States and Puerto Rico, ranging from neighborhoods to Congressional districts to the entire nation. This table also has a companion table (Same table name with MOE Suffix) with the margin of error (MOE) values for each estimated element. MOE is expressed as a measure value for each estimated element. So a value of 25 and an MOE of 5 means 25 +/- 5 (or statistical certainty between 20 and 30). There are also special cases of MOE. An MOE of -1 means the associated estimates do not have a measured error. An MOE of 0 means that error calculation is not appropriate for the associated value. An MOE of 109 is set whenever an estimate value is 0. The MOEs of aggregated elements and percentages must be calculated. This process means using standard error calculations as described in "American Community Survey Multiyear Accuracy of the Data (3-year 2008-2010 and 5-year 2006-2010)". Also, following Census guidelines, aggregated MOEs do not use more than 1 0-element MOE (109) to prevent over estimation of the error. Due to the complexity of the calculations, some percentage MOEs cannot be calculated (these are set to null in the summary-level MOE tables).

The name for table 'ACS10LANSTMOE' was added as a prefix to all field names imported from that table. Be sure to turn off 'Show Field Aliases' to see complete field names in the Attribute Table of this feature layer. This can be done in the 'Table Options' drop-down menu in the Attribute Table or with key sequence '[CTRL]+[SHIFT]+N'. Due to database restrictions, the prefix may have been abbreviated if the field name exceded the maximum allowed characters.
ACS 5YR Demographic Estimate Data by County
hudgis-hud.opendata.arcgis.com
data.lojic.org
+1more
Updated Aug 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Housing and Urban Development (2023). ACS 5YR Demographic Estimate Data by County [Dataset]. https://hudgis-hud.opendata.arcgis.com/datasets/c15a0ed006bc4bb5938e0161e7f21f33
Explore at:
Dataset updated
Aug 21, 2023
Dataset provided by
United States Department of Housing and Urban Developmenthttp://www.hud.gov/
Authors
Department of Housing and Urban Development
Area covered
Description
2016-2020 ACS 5-Year estimates of demographic variables (see below) compiled at the county level..The American Community Survey (ACS) 5 Year 2016-2020 demographic information is a subset of information available for download from the U.S. Census. Tables used in the development of this dataset include: B01001 - Sex By Age;

B03002 - Hispanic Or Latino Origin By Race; B11001 - Household Type (Including Living Alone); B11005 - Households By Presence Of People Under 18 Years By Household Type; B11006 - Households By Presence Of People 60 Years And Over By Household Type; B16005 - Nativity By Language Spoken At Home By Ability To Speak English For The Population 5 Years And Over; B25010 - Average Household Size Of Occupied Housing Units By Tenure, and; B15001 - Sex by Educational Attainment for the Population 18 Years and Over; To learn more about the American Community Survey (ACS), and associated datasets visit: https://www.census.gov/programs-surveys/acs, for questions about the spatial attribution of this dataset, please reach out to us at GISHelpdesk@hud.gov. Data Dictionary: DD_ACS 5-Year Demographic Estimate Data by County Date of Coverage: 2016-2020
a
ACS 5YR Demographic Estimate Data by Tract
opendata.atlantaregional.com
data.lojic.org
+2more
Updated Jan 31, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Housing and Urban Development (2019). ACS 5YR Demographic Estimate Data by Tract [Dataset]. https://opendata.atlantaregional.com/datasets/HUD::acs-5yr-demographic-estimate-data-by-tract/explore
Explore at:
Dataset updated
Jan 31, 2019
Dataset authored and provided by
Department of Housing and Urban Development
Area covered
Description
The American Community Survey (ACS) 5 Year 2013-2017 demographic information is a subset of information available for download from the U.S. Census. Tables used in the development of this dataset include:B01001 - Sex By Age;B03002 - Hispanic Or Latino Origin By Race;B11001 - Household Type (Including Living Alone);B11005 - Households By Presence Of People Under 18 Years By Household Type;B11006 - Households By Presence Of People 60 Years And Over By Household Type;B16005 - Nativity By Language Spoken At Home By Ability To Speak English For The Population 5 Years And Over;B25010 - Average Household Size Of Occupied Housing Units By Tenure, and;B15001 - Sex by Educational Attainment for the Population 18 Years and Over;

To learn more about the American Community Survey (ACS), and associated datasets visit: https://www.census.gov/programs-surveys/acs

Data Dictionary: DD_ACS 5-Year Demographic Estimate Data by Tract Date of Coverage: 2013-2017 Data Updated: Biennially
d
Demographics
catalog.data.gov
datasets.ai
+5more
Updated Nov 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lake County Illinois GIS (2024). Demographics [Dataset]. https://catalog.data.gov/dataset/demographics-0be32
Explore at:
Dataset updated
Nov 22, 2024
Dataset provided by
Lake County Illinois GIS
Description
Lake County, Illinois Demographic Data. Explanation of field attributes: Total Population – The entire population of Lake County. White – Individuals who are of Caucasian race. This is a percent.African American – Individuals who are of African American race. This is a percent.Asian – Individuals who are of Asian race. This is a percent. Hispanic – Individuals who are of Hispanic ethnicity. This is a percent. Does not Speak English- Individuals who speak a language other than English in their household. This is a percent. Under 5 years of age – Individuals who are under 5 years of age. This is a percent. Under 18 years of age – Individuals who are under 18 years of age. This is a percent. 18-64 years of age – Individuals who are between 18 and 64 years of age. This is a percent. 65 years of age and older – Individuals who are 65 years old or older. This is a percent. Male – Individuals who are male in gender. This is a percent. Female – Individuals who are female in gender. This is a percent. High School Degree – Individuals who have obtained a high school degree. This is a percent. Associate Degree – Individuals who have obtained an associate degree. This is a percent. Bachelor’s Degree or Higher – Individuals who have obtained a bachelor’s degree or higher. This is a percent. Utilizes Food Stamps – Households receiving food stamps/ part of SNAP (Supplemental Nutrition Assistance Program). This is a percent. Median Household Income - A median household income refers to the income level earned by a given household where half of the homes in the area earn more and half earn less. This is a dollar amount. No High School – Individuals who have not obtained a high school degree. This is a percent. Poverty – Poverty refers to families and people whose income in the past 12 months is below the poverty level. This is a percent.
g
Census of Population and Housing, 2000 [United States]: Summary File 4,...
search.gesis.org
Updated Feb 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Commerce. Bureau of the Census (2021). Census of Population and Housing, 2000 [United States]: Summary File 4, District of Columbia - Version 1 [Dataset]. http://doi.org/10.3886/ICPSR13520.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR13520.v1
Dataset updated
Feb 26, 2021
Dataset provided by
GESIS search
ICPSR - Interuniversity Consortium for Political and Social Research
Authors
United States Department of Commerce. Bureau of the Census
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de457436https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de457436
Area covered
Washington, United States
Description
Abstract (en): Summary File 4 (SF 4) from the United States 2000 Census contains the sample data, which is the information compiled from the questions asked of a sample of all people and housing units. Population items include basic population totals: urban and rural, households and families, marital status, grandparents as caregivers, language and ability to speak English, ancestry, place of birth, citizenship status, year of entry, migration, place of work, journey to work (commuting), school enrollment and educational attainment, veteran status, disability, employment status, industry, occupation, class of worker, income, and poverty status. Housing items include basic housing totals: urban and rural, number of rooms, number of bedrooms, year moved into unit, household size and occupants per room, units in structure, year structure built, heating fuel, telephone service, plumbing and kitchen facilities, vehicles available, value of home, monthly rent, and shelter costs. In Summary File 4, the sample data are presented in 213 population tables (matrices) and 110 housing tables, identified with "PCT" and "HCT" respectively. Each table is iterated for 336 population groups: the total population, 132 race groups, 78 American Indian and Alaska Native tribe categories (reflecting 39 individual tribes), 39 Hispanic or Latino groups, and 86 ancestry groups. The presentation of SF4 tables for any of the 336 population groups is subject to a population threshold. That is, if there are fewer than 100 people (100-percent count) in a specific population group in a specific geographic area, and there are fewer than 50 unweighted cases, their population and housing characteristics data are not available for that geographic area in SF4. For the ancestry iterations, only the 50 unweighted cases test can be performed. See Appendix H: Characteristic Iterations, for a complete list of characteristic iterations. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.. All persons in housing units in the District of Columbia in 2000. 2013-05-25 Multiple Census data file segments were repackaged for distribution into a single zip archive per dataset. No changes were made to the data or documentation.2006-01-12 All files were removed from dataset 342 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 341 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 340 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 339 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 338 and flagged as study-level files, so that they will accompany all downloads. Because of the number of files per state in Summary File 4, ICPSR has given each state its own ICPSR study number in the range ICPSR 13512-13563. The study number for the national file is 13570. Data for each state are being released as they become available.The data are provided in 38 segments (files) per iteration. These segments are PCT1-PCT4, PCT5-PCT16, PCT17-PCT34, PCT35-PCT37, PCT38-PCT45, PCT46-PCT49, PCT50-PCT61, PCT62-PCT67, PCT68-PCT71, PCT72-PCT76, PCT77-PCT78, PCT79-PCT81, PCT82-PCT84, PCT85-PCT86 (partial), PCT86 (partial), PCT87-PCT103, PCT104-PCT120, PCT121-PCT131, PCT132-PCT137, PCT138-PCT143, PCT144, PCT145-PCT150, PCT151-PCT156, PCT157-PCT162, PCT163-PCT208, PCT209-PCT213, HCT1-HCT9, HCT10-HCT18, HCT19-HCT22, HCT23-HCT25, HCT26-HCT29, HCT30-HCT39, HCT40-HCT55, HCT56-HCT61, HCT62-HCT70, HCT71-HCT81, HCT82-HCT86, and HCT87-HCT110. The iterations are Parts 1-336, the Geographic Header File is Part 337. The Geographic Header File is in fixed-format ASCII and the table files are in comma-delimited ASCII format. A merged iteration will have 7,963 variables.For Parts 251-336, the part names contain numbers within parentheses that refer to the Ancestry Code List (page G1 of the codebook).
w
R2 & NE: Block Group Level 2006-2010 ACS Language Summary
data.wu.ac.at
tgrshp (compressed)
Updated Jan 13, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2018). R2 & NE: Block Group Level 2006-2010 ACS Language Summary [Dataset]. https://data.wu.ac.at/schema/data_gov/NmYwZDFhZDEtZDA1Yi00ZTUyLWJhM2YtNTJmMjQ5OGRiMTRl
Explore at:
tgrshp (compressed)Available download formats
Dataset updated
Jan 13, 2018
Dataset provided by
U.S. Environmental Protection Agency
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
f6141dff0d6ac944a7247dac90daeba2ac30b0e3
Description
The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Block Groups (BGs) are defined before tabulation block delineation and numbering, but are clusters of blocks within the same census tract that have the same first digit of their 4-digit census block number from the same decennial census. For example, Census 2000 tabulation blocks 3001, 3002, 3003,.., 3999 within Census 2000 tract 1210.02 are also within BG 3 within that census tract. Census 2000 BGs generally contained between 600 and 3,000 people, with an optimum size of 1,500 people. Most BGs were delineated by local participants in the Census Bureau's Participant Statistical Areas Program (PSAP). The Census Bureau delineated BGs only where the PSAP participant declined to delineate BGs or where the Census Bureau could not identify any local PSAP participant. A BG usually covers a contiguous area. Each census tract contains at least one BG, and BGs are uniquely numbered within census tract. Within the standard census geographic hierarchy, BGs never cross county or census tract boundaries, but may cross the boundaries of other geographic entities like county subdivisions, places, urban areas, voting districts, congressional districts, and American Indian / Alaska Native / Native Hawaiian areas. BGs have a valid code range of 0 through 9. BGs coded 0 were intended to only include water area, no land area, and they are generally in territorial seas, coastal water, and Great Lakes water areas. For Census 2000, rather than extending a census tract boundary into the Great Lakes or out to the U.S. nautical three-mile limit, the Census Bureau delineated some census tract boundaries along the shoreline or just offshore. The Census Bureau assigned a default census tract number of 0 and BG of 0 to these offshore, water-only areas not included in regularly numbered census tract areas.

This table contains data on language ability and linguistic isolation from the American Community Survey 2006-2010 database for block groups. Linguistic isolation is defined as no one 14 and over speaks English only or speaks English "very well". The American Community Survey (ACS) is a household survey conducted by the U.S. Census Bureau that currently has an annual sample size of about 3.5 million addresses. ACS estimates provides communities with the current information they need to plan investments and services. Information from the survey generates estimates that help determine how more than $400 billion in federal and state funds are distributed annually. Each year the survey produces data that cover the periods of 1-year, 3-year, and 5-year estimates for geographic areas in the United States and Puerto Rico, ranging from neighborhoods to Congressional districts to the entire nation. This table also has a companion table (Same table name with MOE Suffix) with the margin of error (MOE) values for each estimated element. MOE is expressed as a measure value for each estimated element. So a value of 25 and an MOE of 5 means 25 +/- 5 (or statistical certainty between 20 and 30). There are also special cases of MOE. An MOE of -1 means the associated estimates do not have a measured error. An MOE of 0 means that error calculation is not appropriate for the associated value. An MOE of 109 is set whenever an estimate value is 0. The MOEs of aggregated elements and percentages must be calculated. This process means using standard error calculations as described in "American Community Survey Multiyear Accuracy of the Data (3-year 2008-2010 and 5-year 2006-2010)". Also, following Census guidelines, aggregated MOEs do not use more than 1 0-element MOE (109) to prevent over estimation of the error. Due to the complexity of the calculations, some percentage MOEs cannot be calculated (these are set to null in the summary-level MOE tables).

The name for table 'ACS10LANBGMOE' was added as a prefix to all field names imported from that table. Be sure to turn off 'Show Field Aliases' to see complete field names in the Attribute Table of this feature layer. This can be done in the 'Table Options' drop-down menu in the Attribute Table or with key sequence '[CTRL]+[SHIFT]+N'. Due to database restrictions, the prefix may have been abbreviated if the field name exceded the maximum allowed characters.
h
first-impressions-v2
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeray, first-impressions-v2 [Dataset]. https://huggingface.co/datasets/yeray142/first-impressions-v2
Explore at:
Authors
Yeray
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for First Impressions V2

The first impressions data set, comprises 10000 clips (average duration 15s) extracted from more than 3,000 different YouTube high-definition (HD) videos of people facing and speaking in English to a camera. The videos are split into training, validation and test sets with a 3:1:1 ratio. People in videos show different gender, age, nationality, and ethnicity. Videos are labeled with personality traits variables. Amazon Mechanical Turk (AMT) was… See the full description on the dataset page: https://huggingface.co/datasets/yeray142/first-impressions-v2.
P
###How do I speak to a live agent at Lufthansa Airlines? Dataset
paperswithcode.com
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ###How do I speak to a live agent at Lufthansa Airlines? Dataset [Dataset]. https://paperswithcode.com/dataset/how-do-i-speak-to-a-live-agent-at-lufthansa
Explore at:
Dataset updated
Jun 28, 2025
Description
Travelers frequently encounter situations where they must speak directly to a person at an airline. ☎️+1 (844) 459-5676 Whether it’s about changing flights, lost baggage, or ticket upgrades, human help is often essential. ☎️+1 (844) 459-5676

Calling Lufthansa Airlines is one of the fastest and most reliable ways to speak to a live agent. ☎️+1 (844) 459-5676 First, dial the Lufthansa customer service number: ☎️+1 (844) 459-5676. Make sure you’re calling during the airline’s operating hours.

Once you call, you’ll hear an automated menu. ☎️+1 (844) 459-5676 To bypass this, press the number that connects you to a live agent, or say “agent.” ☎️+1 (844) 459-5676

If you don’t get a live person right away, don’t hang up. ☎️+1 (844) 459-5676 Instead, continue to follow the prompts or press “0” repeatedly, which often redirects you. ☎️+1 (844) 459-5676

Lufthansa’s automated system is designed to handle common inquiries like flight status or baggage info. ☎️+1 (844) 459-5676 However, when your issue is complex, speaking to a real person is best. ☎️+1 (844) 459-5676

If calling isn’t successful, try contacting Lufthansa through their live chat feature online. ☎️+1 (844) 459-5676 The live chat option is available on the Lufthansa website under “Help & Contact.” ☎️+1 (844) 459-5676

Another helpful method is using Lufthansa’s mobile app. ☎️+1 (844) 459-5676 From the app, you can chat or request a callback from an agent, depending on availability. ☎️+1 (844) 459-5676

Social media platforms like Twitter (X) or Facebook Messenger can also help you get human assistance. ☎️+1 (844) 459-5676 Lufthansa’s social media team often responds quickly to messages or tweets. ☎️+1 (844) 459-5676

When calling Lufthansa, always have your booking reference or ticket number ready. ☎️+1 (844) 459-5676 This speeds up the process and helps the agent quickly access your reservation. ☎️+1 (844) 459-5676

For faster service, call during off-peak hours, like early morning or late evening. ☎️+1 (844) 459-5676 Peak hours often mean longer wait times, especially during holidays or travel disruptions. ☎️+1 (844) 459-5676

If you're calling from the U.S., use Lufthansa’s dedicated U.S. contact line. ☎️+1 (844) 459-5676 This number connects you to English-speaking representatives trained for U.S.-based passengers. ☎️+1 (844) 459-5676

In case you’re abroad, visit the Lufthansa website for local numbers by country. ☎️+1 (844) 459-5676 This ensures you’re speaking with a team familiar with your region’s travel policies. ☎️+1 (844) 459-5676

Some travelers report better results by choosing the option for “booking” or “new reservation.” ☎️+1 (844) 459-5676 These teams often have shorter wait times and can transfer you internally. ☎️+1 (844) 459-5676

Avoid calling on Mondays or the first day of a major travel disruption. ☎️+1 (844) 459-5676 These are high-traffic periods when thousands of passengers are calling simultaneously. ☎️+1 (844) 459-5676

Patience is key when trying to reach a live person. ☎️+1 (844) 459-5676 If your call drops or disconnects, try calling again using a different phone or internet line. ☎️+1 (844) 459-5676

For frequent flyers or loyalty members, use the Lufthansa Miles & More hotline. ☎️+1 (844) 459-5676 These members often receive priority routing and faster agent access. ☎️+1 (844) 459-5676

You can also speak to someone at the airport’s Lufthansa ticket counter. ☎️+1 (844) 459-5676 However, be aware that airport agents are busiest right before flight departures. ☎️+1 (844) 459-5676

During flight cancellations or rebookings, Lufthansa sometimes sends proactive SMS messages. ☎️+1 (844) 459-5676 These texts may offer options to call or reschedule without needing to wait. ☎️+1 (844) 459-5676

Still stuck? Try using the “Call Back Request” tool on the Lufthansa support page. ☎️+1 (844) 459-5676 This feature allows you to enter your number and receive a return call. ☎️+1 (844) 459-5676

When you finally reach a live person, stay calm and clearly explain your issue. ☎️+1 (844) 459-5676 Provide all required details up front to reduce call time and confusion. ☎️+1 (844) 459-5676

Lufthansa agents are trained to handle all types of concerns, from check-in problems to emergency changes. ☎️+1 (844) 459-5676 If they can’t resolve it, they will escalate your case internally. ☎️+1 (844) 459-5676

After the call, ask for a case number or confirmation email. ☎️+1 (844) 459-5676 This ensures that you have a reference for any follow-up conversations or claims. ☎️+1 (844) 459-5676

Don’t forget to rate your call experience if prompted via email or app. ☎️+1 (844) 459-5676 Feedback helps Lufthansa improve their customer service process over time. ☎️+1 (844) 459-5676

In summary, calling Lufthansa directly is the best way to speak to a live person. ☎️+1 (844) 459-5676 Be patient, prepared, and persistent for the best results every time. ☎️+1 (844) 459-5676
s
ACS 5 Year Demographic Data by Place, 2008-2012
searchworks.stanford.edu
zip
Updated Jan 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ACS 5 Year Demographic Data by Place, 2008-2012 [Dataset]. https://searchworks.stanford.edu/view/hp504dh6313
Explore at:
zipAvailable download formats
Dataset updated
Jan 28, 2021
Description
This polygon shapefile contains 5-year American Community Survey (ACS) estimates of demographic variables at the place level. The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureaus Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line Files include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a State, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the State in which they are located. The boundaries for CDPs often are defined in partnership with State, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs for the 2010 Census is that they must contain some housing and population. The boundaries of all 2010 Census incorporated places are as of January 1, 2010 as reported through the Census Bureaus Boundary and Annexation Survey (BAS). The boundaries of all 2010 Census CDPs were delineated as part of the Census Bureaus Participant Statistical Areas Program (PSAP).The American Community Survey (ACS) 5 Year 2008-2012 demographic information is a subset of information available for download. Downloaded tables include: B01001 - Sex By Age, B03002 - Hispanic Or Latino Origin By Race, B11001 - Household Type (Including Living Alone), B11005 - Households By Presence Of People Under 18 Years By Household Type, B11006 - Households By Presence Of People 60 Years And Over By Household Type, B16005 - Nativity By Language Spoken At Home By Ability To Speak English For The Population 5 Years And Over, B25010 - Average Household Size Of Occupied Housing Units By Tenure and B15001 - Sex by Educational Attainment for the Population 18 Years and Over. Data is current as of 5/6/2015. released in 2012.
c
Census of Population and Housing, 2000: Summary File 3, Alabama
archive.ciser.cornell.edu
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of the Census (2024). Census of Population and Housing, 2000: Summary File 3, Alabama [Dataset]. http://doi.org/10.6077/rfnf-p929
Explore at:
Unique identifier
https://doi.org/10.6077/rfnf-p929
Dataset updated
Jun 1, 2024
Dataset authored and provided by
Bureau of the Census
Variables measured
HousingUnit, Individual
Description
Summary File 3 contains sample data, which is the information compiled from the questions asked of a sample of all people and housing units in the United States. Population items include basic population totals as well as counts for the following characteristics: urban and rural, households and families, marital status, grandparents as caregivers, language and ability to speak English, ancestry, place of birth, citizenship status, year of entry, migration, place of work, journey to work (commuting), school enrollment and educational attainment, veteran status, disability, employment status, industry, occupation, class of worker, income, and poverty status. Housing items include basic housing totals and counts for urban and rural, number of rooms, number of bedrooms, year moved into unit, household size and occupants per room, units in structure, year structure built, heating fuel, telephone service, plumbing and kitchen facilities, vehicles available, value of home, and monthly rent and shelter costs. The Summary File 3 population tables are identified with a "P" prefix and the housing tables are identified with an "H," followed by a sequential number. The "P" and "H" tables are shown for the block group and higher level geography, while the "PCT" and "HCT" tables are shown for the census tract and higher level geography. There are 16 "P" tables, 15 "PCT" tables, and 20 "HCT" tables that bear an alphabetic suffix on the table number, indicating that they are repeated for nine major race and Hispanic or Latino groups. There are 484 population tables and 329 housing tables for a total of 813 unique tables. (Source: downloaded from ICPSR 7/13/10)

Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR13342.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Top Languages Spoken in the United States [Dataset]. https://www.kaggle.com/datasets/thedevastator/top-languages-spoken-in-the-united-states/code

Top Languages Spoken in the United States

The Impact of linguistics on Community and Business in America

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 22, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

Area covered

United States

Description

Top Languages Spoken in the United States

The Impact of linguistics on Community and Business in America

About this dataset

Languages are an important part of daily life in the USA. Here is a table that shows the most common languages spoken in the USA, as well as a big spreadsheet which shows each CBSA (Core-Based Statistical Area, or urban area).

Language usage varies widely throughout the United States. According to the latest census data, over 350 different languages are represented in homes across the country. The following table and spreadsheet provide more detailed information on language usage throughout the various states and cities in the US:

Columns: - index: Index column for dataframe - Table with column headers in row 5 and row headers in column A: Contains language data for each CBSA (Core Based Statistical Area) - Unnamed: 1: Rank of CBSA by total number of speakers of all languages - Unnamed: 2: Name of CBSA - Unnamed: 3: Population of CBSA - Unnamed: 4: Percent of population that speaks English very well - Unnamed: 5 through Unnamed: 58 : Languages spoken by at least 0.1% of the population, with corresponding percentages

How to use the dataset

This dataset can be used to understand the linguistic diversity of the United States, and to compare languages spoken across different states and cities.
This data can also be used to explore trends in language usage over time.
businesses can use this dataset to identify which languages are most commonly spoken in the areas in which they operate and tailor their marketing or customer service accordingly.
Schools could use this dataset to plan language-learning programs based on the needs of their community.
Policymakers could use this data to better understand linguistic diversity in the United States and design programs to support bilingualism or multilingualism

Research Ideas

Businesses can use this dataset to identify which languages are most commonly spoken in the areas in which they operate and cater their marketing or customer service accordingly.
Schools could use this data to plan language-learning programs based on the needs of their community.
Policymakers could use this dataset to better understand linguistic diversity in the United States and design programs to support bilingualism or multilingualism

Acknowledgements

This dataset was created by Gary Hoover. The data was sourced from https://www.kaggle.com/garyhoov/us-languages

License

Unknown License - Please check the dataset description for more information.

Columns

File: Languages Spoken at Home by Urban Area = CBSA.csv

File: US Languages Spoken at Home 2014.csv | Column name | Description | |:-------------------------------------------------------------------|:--------------| | Table with column headers in row 5 and row headers in column A | |

Clear search

Close search

Google apps

Main menu

Top Languages Spoken in the United States

Top Languages Spoken in the United States

The Impact of linguistics on Community and Business in America

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Data from: Language Spoken at Home

Population and Languages of the Limited English Proficient (LEP) Speakers by...

Audio Visual Speech Dataset: American English

Introduction

Dataset Content

Video Data

Metadata

english_dialects

117 Hours - Latin American Speaking English Speech Data by Mobile Phone

peoples_speech

PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 - Dataset -...

Census of Population and Housing, 2000 [United States]: Summary File 4, Iowa...

Hispanic-English Database

R2 & NE: State Level 2006-2010 ACS Language Summary

ACS 5YR Demographic Estimate Data by County

ACS 5YR Demographic Estimate Data by Tract

Demographics

Census of Population and Housing, 2000 [United States]: Summary File 4,...

R2 & NE: Block Group Level 2006-2010 ACS Language Summary

first-impressions-v2

###How do I speak to a live agent at Lufthansa Airlines? Dataset

ACS 5 Year Demographic Data by Place, 2008-2012

Census of Population and Housing, 2000: Summary File 3, Alabama

Top Languages Spoken in the United States

The Impact of linguistics on Community and Business in America

Top Languages Spoken in the United States

The Impact of linguistics on Community and Business in America

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns