14 datasets found

N
English Population Distribution Data - United States Cities (2019-2023)
neilsberg.com
csv, json
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). English Population Distribution Data - United States Cities (2019-2023) [Dataset]. https://www.neilsberg.com/insights/lists/english-population-in-united-states-by-city/
Explore at:
json, csvAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Variables measured
English Population Count, English Population Percentage, English Population Share of United States
Measurement technique
To measure the rank and respective trends, we initially gathered data from the five most recent American Community Survey (ACS) 5-Year Estimates. We then analyzed and categorized the data for each of the origins / ancestries identified by the U.S. Census Bureau. It is possible that a small population exists but was not reported or captured due to limitations or variations in Census data collection and reporting. We ensured that the population estimates used in this dataset pertain exclusively to the identified origins / ancestries and do not rely on any ethnicity classification, unless explicitly required. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

This list ranks the 19,348 cities in the United States by English population, as estimated by the United States Census Bureau. It also highlights population changes in each city over the past five years.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:

2019-2023 American Community Survey 5-Year Estimates

2014-2018 American Community Survey 5-Year Estimates

2009-2013 American Community Survey 5-Year Estimates

Variables / Data Columns

Rank by English Population: This column displays the rank of city in the United States by their English population, using the most recent ACS data available.

City: The City for which the rank is shown in the previous column.

English Population: The English population of the city is shown in this column.

% of Total City Population: This shows what percentage of the total city population identifies as English. Please note that the sum of all percentages may not equal one due to rounding of values.

% of Total United States English Population: This tells us how much of the entire United States English population lives in that city. Please note that the sum of all percentages may not equal one due to rounding of values.

5 Year Rank Trend: This column displays the rank trend across the last 5 years.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
N
English Population Distribution Data - United States Counties (2019-2023)
neilsberg.com
csv, json
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). English Population Distribution Data - United States Counties (2019-2023) [Dataset]. https://www.neilsberg.com/insights/lists/english-population-in-united-states-by-county/
Explore at:
json, csvAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Variables measured
English Population Count, English Population Percentage, English Population Share of United States
Measurement technique
To measure the rank and respective trends, we initially gathered data from the five most recent American Community Survey (ACS) 5-Year Estimates. We then analyzed and categorized the data for each of the origins / ancestries identified by the U.S. Census Bureau. It is possible that a small population exists but was not reported or captured due to limitations or variations in Census data collection and reporting. We ensured that the population estimates used in this dataset pertain exclusively to the identified origins / ancestries and do not rely on any ethnicity classification, unless explicitly required. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

This list ranks the 3,057 counties in the United States by English population, as estimated by the United States Census Bureau. It also highlights population changes in each county over the past five years.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:

2019-2023 American Community Survey 5-Year Estimates

2014-2018 American Community Survey 5-Year Estimates

2009-2013 American Community Survey 5-Year Estimates

Variables / Data Columns

Rank by English Population: This column displays the rank of county in the United States by their English population, using the most recent ACS data available.

County: The County for which the rank is shown in the previous column.

English Population: The English population of the county is shown in this column.

% of Total County Population: This shows what percentage of the total county population identifies as English. Please note that the sum of all percentages may not equal one due to rounding of values.

% of Total United States English Population: This tells us how much of the entire United States English population lives in that county. Please note that the sum of all percentages may not equal one due to rounding of values.

5 Year Rank Trend: This column displays the rank trend across the last 5 years.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
N
English Population Distribution Data - United States States (2019-2023)
neilsberg.com
csv, json
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). English Population Distribution Data - United States States (2019-2023) [Dataset]. https://www.neilsberg.com/insights/lists/english-population-in-united-states-by-state/
Explore at:
json, csvAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Variables measured
English Population Count, English Population Percentage, English Population Share of United States
Measurement technique
To measure the rank and respective trends, we initially gathered data from the five most recent American Community Survey (ACS) 5-Year Estimates. We then analyzed and categorized the data for each of the origins / ancestries identified by the U.S. Census Bureau. It is possible that a small population exists but was not reported or captured due to limitations or variations in Census data collection and reporting. We ensured that the population estimates used in this dataset pertain exclusively to the identified origins / ancestries and do not rely on any ethnicity classification, unless explicitly required. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

This list ranks the 50 states in the United States by English population, as estimated by the United States Census Bureau. It also highlights population changes in each state over the past five years.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:

2019-2023 American Community Survey 5-Year Estimates

2014-2018 American Community Survey 5-Year Estimates

2009-2013 American Community Survey 5-Year Estimates

Variables / Data Columns

Rank by English Population: This column displays the rank of state in the United States by their English population, using the most recent ACS data available.

State: The State for which the rank is shown in the previous column.

English Population: The English population of the state is shown in this column.

% of Total State Population: This shows what percentage of the total state population identifies as English. Please note that the sum of all percentages may not equal one due to rounding of values.

% of Total United States English Population: This tells us how much of the entire United States English population lives in that state. Please note that the sum of all percentages may not equal one due to rounding of values.

5 Year Rank Trend: This column displays the rank trend across the last 5 years.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
F
American English General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple English speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for US English.

•
Voice Assistants: Build smart assistants capable of understanding natural American conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
a
Languages and English Ability - Seattle Neighborhoods
hub.arcgis.com
data.seattle.gov
+2more
Updated Feb 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle ArcGIS Online (2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://hub.arcgis.com/datasets/5ebf54a443194f1080ffde06d1d381b5
Explore at:
Dataset updated
Feb 22, 2024
Dataset authored and provided by
City of Seattle ArcGIS Online
Area covered
Seattle
Description
Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.Table created for and used in the Neighborhood Profiles application.Vintages: 2023ACS Table(s): B16004, C16002Data downloaded from: Census Bureau's Explore Census Data The United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.
Data_Sheet_1_Recent Trends and Potential Drivers of Non-invasive...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steffen E. Petersen; Rocco Friebel; Victor Ferrari; Yuchi Han; Nay Aung; Asmaa Kenawy; Timothy S. E. Albert; Huseyin Naci (2023). Data_Sheet_1_Recent Trends and Potential Drivers of Non-invasive Cardiovascular Imaging Use in the United States of America and England.PDF [Dataset]. http://doi.org/10.3389/fcvm.2020.617771.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fcvm.2020.617771.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Steffen E. Petersen; Rocco Friebel; Victor Ferrari; Yuchi Han; Nay Aung; Asmaa Kenawy; Timothy S. E. Albert; Huseyin Naci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
England, United States
Description
Background: Non-invasive Cardiovascular imaging (NICI), including cardiovascular magnetic resonance (CMR) imaging provides important information to guide the management of patients with cardiovascular conditions. Current rates of NICI use and potential policy determinants in the United States of America (US) and England remain unexplored.Methods: We compared NICI activity in the US (Medicare fee-for-service, 2011–2015) and England (National Health Service, 2012–2016). We reviewed recommendations related to CMR from Clinical Practice Guidelines, Appropriate Use Criteria (AUC), and Choosing Wisely. We then categorized recommendations according to whether CMR was the only recommended NICI technique (substitutable indications). Reimbursement policies in both settings were systematically collated and reviewed using publicly available information.Results: The 2015 rate of NICI activity in the US was 3.1 times higher than in England (31,055 vs. 9,916 per 100,000 beneficiaries). The proportion of CMR of all NICI was small in both jurisdictions, but nuclear cardiac imaging was more frequent in the US in absolute and relative terms. American and European CPGs were similar, both in terms of number of recommendations and proportions of indications where CMR was not the only recommended NICI technique (substitutable indications). Reimbursement schemes for NICI activity differed for physicians and hospitals between the two settings.Conclusions: Fee-for-service physician compensation in the US for NICI may contribute to higher NICI activity compared to England where physicians are salaried. Reimbursement arrangements for the performance of the test may contribute to the higher proportion of nuclear cardiac imaging out of the total NICI activity. Differences in CPG recommendations appear not to explain the variation in NICI activity between the US and England.
Primary language spoken by the Medicaid and CHIP population
data.virginia.gov
catalog.data.gov
csv
Updated Jan 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2025). Primary language spoken by the Medicaid and CHIP population [Dataset]. https://data.virginia.gov/dataset/primary-language-spoken-by-the-medicaid-and-chip-population
Explore at:
csvAvailable download formats
Dataset updated
Jan 18, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
This data set includes annual counts and percentages of Medicaid and Children’s Health Insurance Program (CHIP) enrollees by primary language spoken (English, Spanish, and all other languages). Results are shown overall; by state; and by five subpopulation topics: race and ethnicity, age group, scope of Medicaid and CHIP benefits, urban or rural residence, and eligibility category. These results were generated using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files (TAF) Release 1 data and the Race/Ethnicity Imputation Companion File. This data set includes Medicaid and CHIP enrollees in all 50 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands who were enrolled for at least one day in the calendar year, except where otherwise noted. Enrollees in Guam, American Samoa, the Northern Mariana Islands, and select states with data quality issues with the primary language variable in TAF are not included. Results shown for the race and ethnicity subpopulation topic exclude enrollees in the U.S. Virgin Islands. Results shown overall (where subpopulation topic is "Total enrollees") exclude enrollees younger than age 5 and enrollees in the U.S. Virgin Islands. Results for states with TAF data quality issues in the year have a value of "Unusable data." Some rows in the data set have a value of "DS," which indicates that data were suppressed according to the Centers for Medicare & Medicaid Services’ Cell Suppression Policy for values between 1 and 10. This data set is based on the brief: "Primary language spoken by the Medicaid and CHIP population in 2020." Enrollees are assigned to a primary language category based on their reported ISO language code in TAF (English/missing, Spanish, and all other language codes) (Primary Language). Enrollees are assigned to a race and ethnicity subpopulation using the state-reported race and ethnicity information in TAF when it is available and of good quality; if it is missing or unreliable, race and ethnicity is indirectly estimated using an enhanced version of Bayesian Improved Surname Geocoding (BISG) (Race and ethnicity of the national Medicaid and CHIP population in 2020). Enrollees are assigned to an age group subpopulation using age as of December 31st of the calendar year. Enrollees are assigned to the comprehensive benefits or limited benefits subpopulation according to the criteria in the "Identifying Beneficiaries with Full-Scope, Comprehensive, and Limited Benefits in the TAF" DQ Atlas brief. Enrollees are assigned to an urban or rural subpopulation based on the 2010 Rural-Urban Commuting Area (RUCA) code associated with their home or mailing address ZIP code in TAF (Rural Medicaid and CHIP enrollees in 2020). Enrollees are assigned to an eligibility category subpopulation using their latest reported eligibility group code, CHIP code, and age in the calendar year. Please refer to the full brief for additional context about the methodology and detailed findings. Future updates to this data set will include more recent data years as the TAF data become available.
State
hub.arcgis.com
mapdirect-fdep.opendata.arcgis.com
+1more
Updated Apr 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2023). State [Dataset]. https://hub.arcgis.com/maps/esri::state-137/about
Explore at:
Dataset updated
Apr 3, 2023
Dataset authored and provided by
Esrihttp://esri.com/
Area covered

Description
This layer shows language or language groups spoken at home by English ability. This is shown by tract, county, and state boundaries. This service is updated annually to contain the most currently released American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. This layer is symbolized to show the percent of individuals age 5+ who are bilingual in English and another language (speak English very well and speak another language at home). To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2019-2023ACS Table(s): C16001 Data downloaded from: Census Bureau's API for American Community Survey Date of API call: December 12, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer is updated automatically when the most current vintage of ACS data is released each year, usually in December. The layer always contains the latest available ACS 5-year estimates. It is updated annually within days of the Census Bureau's release schedule. Click here to learn more about ACS data releases.Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2023 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.
F
American English Call Center Data for Telecom AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
•Inbound Calls:
•Phone Number Porting
•Network Connectivity Issues
•Billing and Payments
•Technical Support
•Service Activation
•International Roaming Enquiry
•Refund Requests and Billing Adjustments
•Emergency Service Access, and others
•Outbound Calls:
•Welcome Calls & Onboarding
•Payment Reminders
•Customer Satisfaction Surveys
•Technical Updates
•Service Usage Reviews
•Network Complaint Status Calls, and more
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., pauses, coughs)
•High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;
U.S. Languages Census Data 2009-2013
kaggle.com
zip
Updated Feb 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Martinson (2021). U.S. Languages Census Data 2009-2013 [Dataset]. https://www.kaggle.com/amartinson193/us-languages-census-data-20092013
Explore at:
zip(91261 bytes)Available download formats
Dataset updated
Feb 19, 2021
Authors
Andreas Martinson
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

This dataset has been cleaned and prepared from the U.S. Census Bureau and can be found here: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html. Further details can be found here: https://www.census.gov/data/developers/data-sets/language-stats.html. We only cleaned data for the state level. There is data for the nation, county, and 'core-based statistical area' levels if you are interested in looking at and cleaning that data.

The original tables had data split into a new tab for each state and wasn't conducive to data analysis. We consolidated all of the information into one table and put it into a tidy format.

The dataset has each of the 50 states, plus Washington, D.C. and Puerto Rico.

Content

The dataset has the following columns: - Group: Character - Subgroup: Character - Language: Character - State: Character - Speakers: Number (Integer) - Margin of Error - English Speakers: Number (Integer) - nonEnglishSpeakers: Number (Integer) - Margin of Error - NonEnglishSpeakers: Number (Integer)

Acknowledgements

This dataset was cleaned for a Data Visualization class I took in Fall 2020. Here is the link to the final project: https://datavis-fall-2020-team.github.io/uslanguages.github.io/

Here is a link to our repository: https://github.com/DataVis-Fall-2020-Team/uslanguages.github.io

Inspiration

The questions we originally sought to answer were: - Which languages are spoken in the U.S.? - Where are these languages spoken within the U.S.? - Which states have the most language diversity? - Which foreign language speakers are most fluent in English? - How have the languages spoken changed over time?
F
American English Call Center Data for BFSI AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for BFSI AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/bfsi-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.
•Inbound Calls:
•Debit Card Block Request
•Transaction Disputes
•Loan Enquiries
•Credit Card Billing Issues
•Account Closure & Claims
•Policy Renewals & Cancellations
•Retirement & Tax Planning
•Investment Risk Queries, and more
•Outbound Calls:
•Loan & Credit Card Offers
•Customer Surveys
•EMI Reminders
•Policy Upgrades
•Insurance Follow-ups
•Investment Opportunity Calls
•Retirement Planning Reviews, and more
This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, background noise)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making financial domain model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender,
F
US English TTS Speech Dataset for Speech Synthesis
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US English TTS Speech Dataset for Speech Synthesis [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/tts-monolgue-english-us
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
The English TTS Monologue Speech Dataset is a professionally curated resource built to train realistic, expressive, and production-grade text-to-speech (TTS) systems. It contains studio-recorded long-form speech by trained native English voice artists, each contributing 1 to 2 hours of clean, uninterrupted monologue audio.
Unlike typical prompt-based datasets with short, isolated phrases, this collection features long-form, topic-driven monologues that mirror natural human narration. It includes content types that are directly useful for real-world applications, like audiobook-style storytelling, educational lectures, health advisories, product explainers, digital how-tos, formal announcements, and more.
All recordings are captured in professional studios using high-end equipment and under the guidance of experienced voice directors.
Recording & Audio Quality
•
Audio Format: WAV, 48 kHz, available in 16-bit, 24-bit, and 32-bit depth

•
SNR: Minimum 30 dB

•
Channel: Mono

•
Recording Duration: 20-30 minutes

•
Recording Environment: Studio-controlled, acoustically treated rooms

•
Per Speaker Volume: 1–2 hours of speech per artist

•
Quality Control: Each file is reviewed and cleaned for common acoustic issues, including: reverberation, lip smacks, mouth clicks, thumping, hissing, plosives, sibilance, background noise, static interference, clipping, and other artifacts.

Only clean, production-grade audio makes it into the final dataset.
Voice Artist Selection
All voice artists are native English speakers with professional training or prior experience in narration. We ensure a diverse pool in terms of age, gender, and region to bring a balanced and rich vocal dataset.
•Artist Profile:
•Gender: Male and Female
•Age Range: 20–60 years
•Regions: Native English-speaking states from United States of America
•
Selection Process: All artists are screened, onboarded, and sample-approved using FutureBeeAI’s proprietary Yugo platform.

Script Quality & Coverage
Scripts are not generic or repetitive. Scripts are professionally authored by domain experts to reflect real-world use cases. They avoid redundancy and include modern vocabulary, emotional range, and phonetically rich sentence structures.
•
Word Count per Script: 3,000–5,000 words per 30-minute session

•Content Types:
•Storytelling
•Script and book reading
•Informational explainers
•Government service instructions
•E-commerce tutorials
•Motivational content
•Health & wellness guides
•Education & career advice
•
Linguistic Design: Balanced punctuation, emotional range, modern syntax, and vocabulary diversity

Transcripts & Alignment
While the script is used during the recording, we also provide post-recording updates to ensure the transcript reflects the final spoken audio. Minor edits are made to adjust for skipped or rephrased words.
•
Segmentation: Time-stamped at the sentence level, aligned to actual spoken delivery

•
Format: Available in plain text and JSON

•Post-processing:
•Corrected for
CNN-DailyMail News Text Summarization
kaggle.com
zip
Updated Oct 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gowri Shankar Penugonda (2021). CNN-DailyMail News Text Summarization [Dataset]. https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail/code
Explore at:
zip(527738644 bytes)Available download formats
Dataset updated
Oct 23, 2021
Authors
Gowri Shankar Penugonda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
dataset-card-for-cnn-dailymail-dataset

dataset-summary

The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

supported-tasks-and-leaderboards

'summarization': Versions 2.0.0 and 3.0.0 of the CNN / DailyMail Dataset can be used to train a model for abstractive and extractive summarization (Version 1.0.0 was developed for machine reading and comprehension and abstractive question answering). The model performance is measured by how high the output summary's ROUGE score for a given article is when compared to the highlight as written by the original article author. Zhong et al (2020) report a ROUGE-1 score of 44.41 when testing a model trained for extractive summarization. See the Papers With Code leaderboard for more models.

languages

The BCP-47 code for English as generally spoken in the United States is en-US and the BCP-47 code for English as generally spoken in the United Kingdom is en-GB. It is unknown if other varieties of English are represented in the data.

dataset-structure

data-instances

For each instance, there is a string for the article, a string for the highlights, and a string for the id. See the CNN / Daily Mail dataset viewer to explore more examples.

{'id': '0054d6d30dbcad772e20b22771153a2a9cbeaf62', 'article': '(CNN) -- An American woman died aboard a cruise ship that docked at Rio de Janeiro on Tuesday, the same ship on which 86 passengers previously fell ill, according to the state-run Brazilian news agency, Agencia Brasil. The American tourist died aboard the MS Veendam, owned by cruise operator Holland America. Federal Police told Agencia Brasil that forensic doctors were investigating her death. The ship's doctors told police that the woman was elderly and suffered from diabetes and hypertension, according the agency. The other passengers came down with diarrhea prior to her death during an earlier part of the trip, the ship's doctors said. The Veendam left New York 36 days ago for a South America tour.' 'highlights': 'The elderly woman suffered from diabetes and hypertension, ship's doctors say . Previously, 86 passengers had fallen ill on the ship, Agencia Brasil says .'}

The average token count for the articles and the highlights are provided below:

Feature Mean Token Count
Article 781
Highlights 56

data-fields

id: a string containing the heximal formated SHA1 hash of the url where the story was retrieved from

article: a string containing the body of the news article

highlights: a string containing the highlight of the article as written by the article author

data-splits

The CNN/DailyMail dataset has 3 splits: train, validation, and test. Below are the statistics for Version 3.0.0 of the dataset.

Dataset Split Number of Instances in Split
Train 287,113
Validation 13,368
Test 11,490

dataset-creation

curation-rationale

Version 1.0.0 aimed to support supervised neural methodologies for machine reading and question answering with a large amount of real natural language training data and released about 313k unique articles and nearly 1M Cloze style questions to go with the articles. Versions 2.0.0 and 3.0.0 changed the structure of the dataset to support summarization rather than question answering. Version 3.0.0 provided a non-anonymized version of the data, whereas both the previous versions were preprocessed to replace named entities with unique identifier labels.

source-data

initial-data-collection-and-normalization

The data consists of news articles and...
F
American English Call Center Data for Retail & E-Commerce AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
•Inbound Calls:
•Product Inquiries
•Order Cancellations
•Refund & Exchange Requests
•Subscription Queries, and more
•Outbound Calls:
•Order Confirmations
•Upselling & Promotions
•Account Updates
•Loyalty Program Offers
•Customer Verifications, and others
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, cough)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
Usage and Applications
This dataset is ideal for a range of voice AI and NLP applications:
•
Automatic Speech Recognition (ASR): Fine-tune English speech-to-text systems.

<span
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Feature	Mean Token Count
Article	781
Highlights	56

Dataset Split	Number of Instances in Split
Train	287,113
Validation	13,368
Test	11,490

Facebook

Twitter

Click to copy link

Link copied

Cite

Neilsberg Research (2025). English Population Distribution Data - United States Cities (2019-2023) [Dataset]. https://www.neilsberg.com/insights/lists/english-population-in-united-states-by-city/

English Population Distribution Data - United States Cities (2019-2023)

Explore at:

json, csvAvailable download formats

Dataset updated

Oct 1, 2025

Dataset authored and provided by

Neilsberg Research

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United States

Variables measured

English Population Count, English Population Percentage, English Population Share of United States

Measurement technique

To measure the rank and respective trends, we initially gathered data from the five most recent American Community Survey (ACS) 5-Year Estimates. We then analyzed and categorized the data for each of the origins / ancestries identified by the U.S. Census Bureau. It is possible that a small population exists but was not reported or captured due to limitations or variations in Census data collection and reporting. We ensured that the population estimates used in this dataset pertain exclusively to the identified origins / ancestries and do not rely on any ethnicity classification, unless explicitly required. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.

Dataset funded by

Neilsberg Research

Description

About this dataset

Context

This list ranks the 19,348 cities in the United States by English population, as estimated by the United States Census Bureau. It also highlights population changes in each city over the past five years.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:

2019-2023 American Community Survey 5-Year Estimates
2014-2018 American Community Survey 5-Year Estimates
2009-2013 American Community Survey 5-Year Estimates

Variables / Data Columns

Rank by English Population: This column displays the rank of city in the United States by their English population, using the most recent ACS data available.
City: The City for which the rank is shown in the previous column.
English Population: The English population of the city is shown in this column.
% of Total City Population: This shows what percentage of the total city population identifies as English. Please note that the sum of all percentages may not equal one due to rounding of values.
% of Total United States English Population: This tells us how much of the entire United States English population lives in that city. Please note that the sum of all percentages may not equal one due to rounding of values.
5 Year Rank Trend: This column displays the rank trend across the last 5 years.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Clear search

Close search

Google apps

Main menu

English Population Distribution Data - United States Cities (2019-2023)

About this dataset

Content

Inspiration

English Population Distribution Data - United States Counties (2019-2023)

About this dataset

Content

Inspiration

English Population Distribution Data - United States States (2019-2023)

About this dataset

Content

Inspiration

American English General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Languages and English Ability - Seattle Neighborhoods

Data_Sheet_1_Recent Trends and Potential Drivers of Non-invasive...

Primary language spoken by the Medicaid and CHIP population

State

American English Call Center Data for Telecom AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

U.S. Languages Census Data 2009-2013

Context

Content

Acknowledgements

Inspiration

American English Call Center Data for BFSI AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

US English TTS Speech Dataset for Speech Synthesis

Recording & Audio Quality

Voice Artist Selection

Script Quality & Coverage

Transcripts & Alignment

CNN-DailyMail News Text Summarization

dataset-card-for-cnn-dailymail-dataset

dataset-summary

supported-tasks-and-leaderboards

languages

dataset-structure

data-instances

data-fields

data-splits

dataset-creation

curation-rationale

source-data

initial-data-collection-and-normalization

American English Call Center Data for Retail & E-Commerce AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

English Population Distribution Data - United States Cities (2019-2023)See More Versions

About this dataset

Content

Inspiration

English Population Distribution Data - United States Cities (2019-2023)