17 datasets found

English Speech Dataset (Latin American Speakers) – 117 Hours Scripted...
nexdata.ai
Updated Oct 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by Smartphone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1021
Explore at:
Dataset updated
Oct 13, 2023
Dataset authored and provided by
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
This dataset contains 117 hours of English speech from Latin American speakers, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(281 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
d
Population of the Limited English Proficient (LEP) Speakers by Community...
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Jan 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). Population of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://catalog.data.gov/dataset/population-of-the-limited-english-proficient-lep-speakers-by-community-district
Explore at:
Dataset updated
Jan 19, 2024
Dataset provided by
data.cityofnewyork.us
Description
Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.
F
American English Call Center Data for Telecom AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
•Inbound Calls:
•Phone Number Porting
•Network Connectivity Issues
•Billing and Payments
•Technical Support
•Service Activation
•International Roaming Enquiry
•Refund Requests and Billing Adjustments
•Emergency Service Access, and others
•Outbound Calls:
•Welcome Calls & Onboarding
•Payment Reminders
•Customer Satisfaction Surveys
•Technical Updates
•Service Usage Reviews
•Network Complaint Status Calls, and more
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., pauses, coughs)
•High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;
D
2023 Limited English Proficiency (LEP) for the DVRPC Region Public Use...
catalog.dvrpc.org
njogis-newjersey.opendata.arcgis.com
+1more
api, geojson, html +1
Updated Nov 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DVRPC (2025). 2023 Limited English Proficiency (LEP) for the DVRPC Region Public Use Microdata Areas [Dataset]. https://catalog.dvrpc.org/dataset/2023-limited-english-proficiency-lep-for-the-dvrpc-region-public-use-microdata-areas
Explore at:
api, xml, html, geojsonAvailable download formats
Dataset updated
Nov 4, 2025
Dataset authored and provided by
DVRPC
Description
The Delaware Valley Regional Planning Commission (DVRPC) is committed to upholding the principles and intentions of the 1964 Civil Rights Act and related nondiscrimination statutes in all of the Commission’s work, including publications, products, communications, public input, and decision-making processes. Language barriers may prohibit people who are Limited in English Proficiency (also known as LEP persons) from obtaining services, information, or participating in public planning processes. To better identify LEP populations and thoroughly evaluate the Commission’s efforts to provide meaningful access, DVRPC has produced this Limited-English Proficiency Plan. This is the data that was used to make the maps for the upcoming plan. Public Use Microdata Area (PUMA), are geographies of at least 100,000 people that are nested within states or equivalent entities. States are able to delineate PUMAs within their borders, or use PUMA Criteria provided by the Census Bureau. Census tables used to gather data from the 2019- 2023 American Community Survey 5-Year Estimates ACS 2019-2023, Table B16001: Language Spoken at Home by Ability to Speak English for the Population 5 Years and Over. ACS data are derived from a survey and are subject to sampling variablity.

*Limited English Proficiency (LEP) refers to those persons that speak English less than "very well". DVRPC has mapped the below Language Groups for our Plan.

Spanish

Russian

Chinese

Korean

Vietnamese Source of PUMA boundaries: US Census Bureau. The TIGER/Line Files Please refer to U:_OngoingProjects\LEP\ACS_5YR_B16001_PUMAs_metadata.xlsx for full attribute loop up and fields used in making the DVRPC LEP Map Series. Please contact Chris Pollard (cpollard@dvrpc.org) should you have any questions about this dataset.
h
english_dialects
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoach Lacombe, english_dialects [Dataset]. https://huggingface.co/datasets/ylacombe/english_dialects
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Yoach Lacombe
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for "english_dialects"

Dataset Summary

This dataset consists of 31 hours of transcribed high-quality audio of English sentences recorded by 120 volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The speakers self-identified as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English. The recording scripts… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/english_dialects.
English-Speaking Politicians
kaggle.com
zip
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maurice rupp (2020). English-Speaking Politicians [Dataset]. https://www.kaggle.com/datasets/mauricerupp/englishspeaking-politicians/code
Explore at:
zip(41917721 bytes)Available download formats
Dataset updated
Nov 10, 2020
Authors
maurice rupp
Description
Content

This dataset contains speeches, interviews and press briefings from over 1'000 english-speaking politicians over the time from 1789 until 2020. The data was scraped from multiple internet sources, each of which is indicated in the column 'URL'.

Dataset Structure

Each speech is treated as one entry, where sentences of other people (e.g. in an interview) are removed. Every paragraph inside the speech is added after a newline (' '). There exist no newlines elsewhere in the data.

Cleaning

Noise tags, time stamps and inaudible words have been removed from the data
F
American English Call Center Data for Delivery & Logistics AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for Delivery & Logistics AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.
Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Multiple provinces of United States of America for accent and dialect diversity.

•
Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.

•
Call Duration: 5 to 15 minutes on average.

•
Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.

•
Recording Environment: Captured in clean, noise-free, echo-free conditions.

Topic Diversity
This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.
•Inbound Calls:
•Order Tracking
•Delivery Complaints
•Undeliverable Addresses
•Return Process Enquiries
•Delivery Method Selection
•Order Modifications, and more
•Outbound Calls:
•Delivery Confirmations
•Subscription Offer Calls
•Incorrect Address Follow-ups
•Missed Delivery Notifications
•Delivery Feedback Surveys
•Out-of-Stock Alerts, and others
This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.
Transcription
All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., pauses, noise)
•High transcription accuracy with word error rate under 5% via dual-layer quality checks.
These transcriptions support fast, reliable model development for English voice AI applications in the delivery sector.
Metadata
Detailed metadata is included for each participant and conversation:
•
Participant Metadata: ID, age, gender, region, accent, dialect.

•
Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

This metadata aids in training specialized models, filtering demographics, and running advanced analytics.
Usage and Applications
<p
F
US English TTS Speech Dataset for Speech Synthesis
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US English TTS Speech Dataset for Speech Synthesis [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/tts-monolgue-english-us
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
The English TTS Monologue Speech Dataset is a professionally curated resource built to train realistic, expressive, and production-grade text-to-speech (TTS) systems. It contains studio-recorded long-form speech by trained native English voice artists, each contributing 1 to 2 hours of clean, uninterrupted monologue audio.
Unlike typical prompt-based datasets with short, isolated phrases, this collection features long-form, topic-driven monologues that mirror natural human narration. It includes content types that are directly useful for real-world applications, like audiobook-style storytelling, educational lectures, health advisories, product explainers, digital how-tos, formal announcements, and more.
All recordings are captured in professional studios using high-end equipment and under the guidance of experienced voice directors.
Recording & Audio Quality
•
Audio Format: WAV, 48 kHz, available in 16-bit, 24-bit, and 32-bit depth

•
SNR: Minimum 30 dB

•
Channel: Mono

•
Recording Duration: 20-30 minutes

•
Recording Environment: Studio-controlled, acoustically treated rooms

•
Per Speaker Volume: 1–2 hours of speech per artist

•
Quality Control: Each file is reviewed and cleaned for common acoustic issues, including: reverberation, lip smacks, mouth clicks, thumping, hissing, plosives, sibilance, background noise, static interference, clipping, and other artifacts.

Only clean, production-grade audio makes it into the final dataset.
Voice Artist Selection
All voice artists are native English speakers with professional training or prior experience in narration. We ensure a diverse pool in terms of age, gender, and region to bring a balanced and rich vocal dataset.
•Artist Profile:
•Gender: Male and Female
•Age Range: 20–60 years
•Regions: Native English-speaking states from United States of America
•
Selection Process: All artists are screened, onboarded, and sample-approved using FutureBeeAI’s proprietary Yugo platform.

Script Quality & Coverage
Scripts are not generic or repetitive. Scripts are professionally authored by domain experts to reflect real-world use cases. They avoid redundancy and include modern vocabulary, emotional range, and phonetically rich sentence structures.
•
Word Count per Script: 3,000–5,000 words per 30-minute session

•Content Types:
•Storytelling
•Script and book reading
•Informational explainers
•Government service instructions
•E-commerce tutorials
•Motivational content
•Health & wellness guides
•Education & career advice
•
Linguistic Design: Balanced punctuation, emotional range, modern syntax, and vocabulary diversity

Transcripts & Alignment
While the script is used during the recording, we also provide post-recording updates to ensure the transcript reflects the final spoken audio. Minor edits are made to adjust for skipped or rephrased words.
•
Segmentation: Time-stamped at the sentence level, aligned to actual spoken delivery

•
Format: Available in plain text and JSON

•Post-processing:
•Corrected for
Census Data - Languages spoken in Chicago, 2008 – 2012
data.cityofchicago.org
healthdata.gov
+3more
csv, xlsx, xml
Updated Sep 12, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2014). Census Data - Languages spoken in Chicago, 2008 – 2012 [Dataset]. https://data.cityofchicago.org/Health-Human-Services/Census-Data-Languages-spoken-in-Chicago-2008-2012/a2fk-ec6q
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Sep 12, 2014
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
U.S. Census Bureau
Area covered
Chicago
Description
This dataset contains estimates of the number of residents aged 5 years or older in Chicago who “speak English less than very well,” by the non-English language spoken at home and community area of residence, for the years 2008 – 2012. See the full dataset description for more information at: https://data.cityofchicago.org/api/views/fpup-mc9v/files/dK6ZKRQZJ7XEugvUavf5MNrGNW11AjdWw0vkpj9EGjg?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\ECONOMIC_INDICATORS\Dataset_Description_Languages_2012_FOR_PORTAL_ONLY.pdf
g
PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 | gimi9.com
gimi9.com
Updated Jul 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_tua-phidu-phidu-birthplace-nes-residents-phn-2016-phn2017/
Explore at:
Dataset updated
Jul 31, 2025
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
This dataset, released August 2017, contains the Australian residents population by their birthplace divided into English speaking (ES) and non-English speaking (NES) countries, 2016. The following countries are designated as ES: Canada, Ireland, New Zealand, South Africa, United Kingdom and the United States of America; the remaining countries are designated as NES. The dataset also includes the population people born overseas and report poor proficiency in English. The data is by Primary Health Network (PHN) 2017 geographic boundaries based on the 2016 Australian Statistical Geography Standard (ASGS). There are 31 PHNs set up by the Australian Government. Each network is controlled by a board of medical professionals and advised by a clinical council and community advisory committee. The boundaries of the PHNs closely align with the Local Hospital Networks where possible. For more information please see the data source notes on the data. Source: Compiled by PHIDU based on the ABS Census of Population and Housing, August 2016. AURIN has spatially enabled the original data. Data that was not shown/not applicable/not published/not available for the specific area ('#', '..', '^', 'np, 'n.a.', 'n.y.a.' in original PHIDU data) was removed.It has been replaced by by Blank cells. For other keys and abbreviations refer to PHIDU Keys.
Department of Rehabilitation Office Contact Information and Addresses with...
data.chhs.ca.gov
data.ca.gov
+4more
csv, docx, zip
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Rehabilitation (2025). Department of Rehabilitation Office Contact Information and Addresses with Languages Spoken [Dataset]. https://data.chhs.ca.gov/dataset/department-of-rehabilitation-office-contact-information-and-addresses-with-languages-spoken
Explore at:
docx, zip, csv(26997)Available download formats
Dataset updated
Nov 7, 2025
Dataset provided by
California Department of Rehabilitationhttp://www.dor.ca.gov/
Authors
Department of Rehabilitation
Description
This dataset is a list of Department of Rehabilitation (DOR) offices and includes contact information, addresses, and languages spoken in each office. Note: In addition to the languages listed, the DOR has various Bilingual language resources available in each office that allow us to serve members of the public who may speak a language other than English.
t
HISPANIC OR LATINO AND RACE - DP05_HIL_P - Dataset - CKAN
portal.tad3.org
Updated Jul 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). HISPANIC OR LATINO AND RACE - DP05_HIL_P - Dataset - CKAN [Dataset]. https://portal.tad3.org/dataset/hispanic-or-latino-and-race-dp05_hil_p
Explore at:
Dataset updated
Jul 23, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
ACS DEMOGRAPHIC AND HOUSING ESTIMATES HISPANIC OR LATINO AND RACE - DP05 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 The terms “Hispanic,” “Latino,” and “Spanish” are used interchangeably. Some respondents identify with all three terms while others may identify with only one of these three specific terms. People who identify with the terms “Hispanic,” “Latino,” or “Spanish” are those who classify themselves in one of the specific Hispanic, Latino, or Spanish categories listed on the questionnaire (“Mexican, Mexican Am., or Chicano,” “Puerto Rican,” or “Cuban”) as well as those who indicate that they are “another Hispanic, Latino, or Spanish origin.” People who do not identify with one of the specific origins listed on the questionnaire but indicate that they are “another Hispanic, Latino, or Spanish origin” are those whose origins are from Spain, the Spanish-speaking countries of Central or South America, or another Spanish culture or origin. Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before their arrival in the UnitedStates. People who identify their origin as Hispanic, Latino, or Spanish may be of any race.
d
ACS 5YR Demographic Estimate Data by State
datasets.ai
catalog.data.gov
21, 57
Updated Feb 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Housing and Urban Development (2024). ACS 5YR Demographic Estimate Data by State [Dataset]. https://datasets.ai/datasets/acs-5yr-demographic-estimate-data-by-state
Explore at:
57, 21Available download formats
Dataset updated
Feb 29, 2024
Dataset authored and provided by
Department of Housing and Urban Development
Description
2016-2020 ACS 5-Year estimates of demographic variables (see below) compiled at the State level. These variables include Sex By Age, Hispanic Or Latino Origin By Race, Household Type (Including Living Alone), Households By Presence Of People Under 18 Years By Household Type, Households By Presence Of People 60 Years And Over By Household Type, Nativity By Language Spoken At Home By Ability To Speak English For The Population 5 Years And Over, Average Household Size Of Occupied Housing Units By Tenure, and Sex by Educational Attainment for the Population 18 Years and Over.
l
ACS 5YR Demographic Estimate Data by State
data.lojic.org
hudgis-hud.opendata.arcgis.com
+1more
Updated Aug 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Housing and Urban Development (2023). ACS 5YR Demographic Estimate Data by State [Dataset]. https://data.lojic.org/datasets/HUD::acs-5yr-demographic-estimate-data-by-state
Explore at:
Dataset updated
Aug 21, 2023
Dataset authored and provided by
Department of Housing and Urban Development
Area covered
Description
2016-2020 ACS 5-Year estimates of demographic variables (see below) compiled at the State level.The American Community Survey (ACS) 5 Year 2016-2020 demographic information is a subset of information available for download from the U.S. Census. Tables used in the development of this dataset include: B01001 - Sex By Age; B03002 - Hispanic Or Latino Origin By Race; B11001 - Household Type (Including Living Alone); B11005 - Households By Presence Of People Under 18 Years By Household Type; B11006 - Households By Presence Of People 60 Years And Over By Household Type; B16005 - Nativity By Language Spoken At Home By Ability To Speak English For The Population 5 Years And Over; B25010 - Average Household Size Of Occupied Housing Units By Tenure, and; B15001 - Sex by Educational Attainment for the Population 18 Years and Over; To learn more about the American Community Survey (ACS), and associated datasets visit: https://www.census.gov/programs-surveys/acs, for questions about the spatial attribution of this dataset, please reach out to us at GISHelpdesk@hud.gov. Data Dictionary: DD_ACS 5-Year Demographic Estimate Data by StateDate of Coverage: 2016-2020
CATALISE: A Multinational and Multidisciplinary Delphi Consensus Study....
plos.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh (2023). CATALISE: A Multinational and Multidisciplinary Delphi Consensus Study. Identifying Language Impairments in Children [Dataset]. http://doi.org/10.1371/journal.pone.0158753
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158753
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Delayed or impaired language development is a common developmental concern, yet there is little agreement about the criteria used to identify and classify language impairments in children. Children's language difficulties are at the interface between education, medicine and the allied professions, who may all adopt different approaches to conceptualising them. Our goal in this study was to use an online Delphi technique to see whether it was possible to achieve consensus among professionals on appropriate criteria for identifying children who might benefit from specialist services. We recruited a panel of 59 experts representing ten disciplines (including education, psychology, speech-language therapy/pathology, paediatrics and child psychiatry) from English-speaking countries (Australia, Canada, Ireland, New Zealand, United Kingdom and USA). The starting point for round 1 was a set of 46 statements based on articles and commentaries in a special issue of a journal focusing on this topic. Panel members rated each statement for both relevance and validity on a seven-point scale, and added free text comments. These responses were synthesised by the first two authors, who then removed, combined or modified items with a view to improving consensus. The resulting set of statements was returned to the panel for a second evaluation (round 2). Consensus (percentage reporting 'agree' or 'strongly agree') was at least 80 percent for 24 of 27 round 2 statements, though many respondents qualified their response with written comments. These were again synthesised by the first two authors. The resulting consensus statement is reported here, with additional summary of relevant evidence, and a concluding commentary on residual disagreements and gaps in the evidence base.
c
Census of Population and Housing, 2000: Summary File 3, Alabama
archive.ciser.cornell.edu
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of the Census (2024). Census of Population and Housing, 2000: Summary File 3, Alabama [Dataset]. http://doi.org/10.6077/rfnf-p929
Explore at:
Unique identifier
https://doi.org/10.6077/rfnf-p929
Dataset updated
Jun 1, 2024
Dataset authored and provided by
Bureau of the Census
Variables measured
HousingUnit, Individual
Description
Summary File 3 contains sample data, which is the information compiled from the questions asked of a sample of all people and housing units in the United States. Population items include basic population totals as well as counts for the following characteristics: urban and rural, households and families, marital status, grandparents as caregivers, language and ability to speak English, ancestry, place of birth, citizenship status, year of entry, migration, place of work, journey to work (commuting), school enrollment and educational attainment, veteran status, disability, employment status, industry, occupation, class of worker, income, and poverty status. Housing items include basic housing totals and counts for urban and rural, number of rooms, number of bedrooms, year moved into unit, household size and occupants per room, units in structure, year structure built, heating fuel, telephone service, plumbing and kitchen facilities, vehicles available, value of home, and monthly rent and shelter costs. The Summary File 3 population tables are identified with a "P" prefix and the housing tables are identified with an "H," followed by a sequential number. The "P" and "H" tables are shown for the block group and higher level geography, while the "PCT" and "HCT" tables are shown for the census tract and higher level geography. There are 16 "P" tables, 15 "PCT" tables, and 20 "HCT" tables that bear an alphabetic suffix on the table number, indicating that they are repeated for nine major race and Hispanic or Latino groups. There are 484 population tables and 329 housing tables for a total of 813 unique tables. (Source: downloaded from ICPSR 7/13/10)

Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR13342.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
Professional group and nationality of panel members.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh (2023). Professional group and nationality of panel members. [Dataset]. http://doi.org/10.1371/journal.pone.0158753.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158753.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Professional group and nationality of panel members.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nexdata (2023). English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by Smartphone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1021

English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by Smartphone

Explore at:

Dataset updated

Oct 13, 2023

Dataset authored and provided by

Nexdata

Variables measured

Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation

Description

This dataset contains 117 hours of English speech from Latin American speakers, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(281 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Clear search

Close search

Google apps

Main menu

English Speech Dataset (Latin American Speakers) – 117 Hours Scripted...

Population of the Limited English Proficient (LEP) Speakers by Community...

American English Call Center Data for Telecom AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

2023 Limited English Proficiency (LEP) for the DVRPC Region Public Use...

*Limited English Proficiency (LEP) refers to those persons that speak English less than "very well". DVRPC has mapped the below Language Groups for our Plan.

Spanish

Russian

Chinese

Korean

english_dialects

English-Speaking Politicians

Content

Dataset Structure

Cleaning

American English Call Center Data for Delivery & Logistics AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

US English TTS Speech Dataset for Speech Synthesis

Recording & Audio Quality

Voice Artist Selection

Script Quality & Coverage

Transcripts & Alignment

Census Data - Languages spoken in Chicago, 2008 – 2012

PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 | gimi9.com

Department of Rehabilitation Office Contact Information and Addresses with...

HISPANIC OR LATINO AND RACE - DP05_HIL_P - Dataset - CKAN

ACS 5YR Demographic Estimate Data by State

ACS 5YR Demographic Estimate Data by State

CATALISE: A Multinational and Multidisciplinary Delphi Consensus Study....

Census of Population and Housing, 2000: Summary File 3, Alabama

Professional group and nationality of panel members.

English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by SmartphoneSee More Versions

English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by Smartphone