17 datasets found
  1. English Speech Dataset (Latin American Speakers) – 117 Hours Scripted...

    • nexdata.ai
    Updated Oct 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by Smartphone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1021
    Explore at:
    Dataset updated
    Oct 13, 2023
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
    Description

    This dataset contains 117 hours of English speech from Latin American speakers, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(281 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  2. d

    Population of the Limited English Proficient (LEP) Speakers by Community...

    • catalog.data.gov
    • data.cityofnewyork.us
    • +1more
    Updated Jan 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). Population of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://catalog.data.gov/dataset/population-of-the-limited-english-proficient-lep-speakers-by-community-district
    Explore at:
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.

  3. F

    American English Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US English Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native US English speakers from our verified contributor pool.
    Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  4. D

    2023 Limited English Proficiency (LEP) for the DVRPC Region Public Use...

    • catalog.dvrpc.org
    • njogis-newjersey.opendata.arcgis.com
    • +1more
    api, geojson, html +1
    Updated Nov 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DVRPC (2025). 2023 Limited English Proficiency (LEP) for the DVRPC Region Public Use Microdata Areas [Dataset]. https://catalog.dvrpc.org/dataset/2023-limited-english-proficiency-lep-for-the-dvrpc-region-public-use-microdata-areas
    Explore at:
    api, xml, html, geojsonAvailable download formats
    Dataset updated
    Nov 4, 2025
    Dataset authored and provided by
    DVRPC
    Description

    The Delaware Valley Regional Planning Commission (DVRPC) is committed to upholding the principles and intentions of the 1964 Civil Rights Act and related nondiscrimination statutes in all of the Commission’s work, including publications, products, communications, public input, and decision-making processes. Language barriers may prohibit people who are Limited in English Proficiency (also known as LEP persons) from obtaining services, information, or participating in public planning processes. To better identify LEP populations and thoroughly evaluate the Commission’s efforts to provide meaningful access, DVRPC has produced this Limited-English Proficiency Plan. This is the data that was used to make the maps for the upcoming plan. Public Use Microdata Area (PUMA), are geographies of at least 100,000 people that are nested within states or equivalent entities. States are able to delineate PUMAs within their borders, or use PUMA Criteria provided by the Census Bureau. Census tables used to gather data from the 2019- 2023 American Community Survey 5-Year Estimates ACS 2019-2023, Table B16001: Language Spoken at Home by Ability to Speak English for the Population 5 Years and Over. ACS data are derived from a survey and are subject to sampling variablity.

    *Limited English Proficiency (LEP) refers to those persons that speak English less than "very well". DVRPC has mapped the below Language Groups for our Plan.

    Spanish

    Russian

    Chinese

    Korean

    Vietnamese Source of PUMA boundaries: US Census Bureau. The TIGER/Line Files Please refer to U:_OngoingProjects\LEP\ACS_5YR_B16001_PUMAs_metadata.xlsx for full attribute loop up and fields used in making the DVRPC LEP Map Series. Please contact Chris Pollard (cpollard@dvrpc.org) should you have any questions about this dataset.

  5. h

    english_dialects

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoach Lacombe, english_dialects [Dataset]. https://huggingface.co/datasets/ylacombe/english_dialects
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Yoach Lacombe
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for "english_dialects"

      Dataset Summary
    

    This dataset consists of 31 hours of transcribed high-quality audio of English sentences recorded by 120 volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The speakers self-identified as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English. The recording scripts… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/english_dialects.

  6. English-Speaking Politicians

    • kaggle.com
    zip
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    maurice rupp (2020). English-Speaking Politicians [Dataset]. https://www.kaggle.com/datasets/mauricerupp/englishspeaking-politicians/code
    Explore at:
    zip(41917721 bytes)Available download formats
    Dataset updated
    Nov 10, 2020
    Authors
    maurice rupp
    Description

    Content

    This dataset contains speeches, interviews and press briefings from over 1'000 english-speaking politicians over the time from 1789 until 2020. The data was scraped from multiple internet sources, each of which is indicated in the column 'URL'.

    Dataset Structure

    Each speech is treated as one entry, where sentences of other people (e.g. in an interview) are removed. Every paragraph inside the speech is added after a newline (' '). There exist no newlines elsewhere in the data.

    Cleaning

    Noise tags, time stamps and inaudible words have been removed from the data

  7. F

    American English Call Center Data for Delivery & Logistics AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English Call Center Data for Delivery & Logistics AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US English Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

    Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

    Participant Diversity:
    Speakers: 60 native US English speakers from our verified contributor pool.
    Regions: Multiple provinces of United States of America for accent and dialect diversity.
    Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.
    Call Duration: 5 to 15 minutes on average.
    Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in clean, noise-free, echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

    Inbound Calls:
    Order Tracking
    Delivery Complaints
    Undeliverable Addresses
    Return Process Enquiries
    Delivery Method Selection
    Order Modifications, and more
    Outbound Calls:
    Delivery Confirmations
    Subscription Offer Calls
    Incorrect Address Follow-ups
    Missed Delivery Notifications
    Delivery Feedback Surveys
    Out-of-Stock Alerts, and others

    This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

    Transcription

    All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, noise)
    High transcription accuracy with word error rate under 5% via dual-layer quality checks.

    These transcriptions support fast, reliable model development for English voice AI applications in the delivery sector.

    Metadata

    Detailed metadata is included for each participant and conversation:

    Participant Metadata: ID, age, gender, region, accent, dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

    This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

    Usage and Applications

    <p

  8. F

    US English TTS Speech Dataset for Speech Synthesis

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). US English TTS Speech Dataset for Speech Synthesis [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/tts-monolgue-english-us
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    The English TTS Monologue Speech Dataset is a professionally curated resource built to train realistic, expressive, and production-grade text-to-speech (TTS) systems. It contains studio-recorded long-form speech by trained native English voice artists, each contributing 1 to 2 hours of clean, uninterrupted monologue audio.

    Unlike typical prompt-based datasets with short, isolated phrases, this collection features long-form, topic-driven monologues that mirror natural human narration. It includes content types that are directly useful for real-world applications, like audiobook-style storytelling, educational lectures, health advisories, product explainers, digital how-tos, formal announcements, and more.

    All recordings are captured in professional studios using high-end equipment and under the guidance of experienced voice directors.

    Recording & Audio Quality

    Audio Format: WAV, 48 kHz, available in 16-bit, 24-bit, and 32-bit depth
    SNR: Minimum 30 dB
    Channel: Mono
    Recording Duration: 20-30 minutes
    Recording Environment: Studio-controlled, acoustically treated rooms
    Per Speaker Volume: 1–2 hours of speech per artist
    Quality Control: Each file is reviewed and cleaned for common acoustic issues, including: reverberation, lip smacks, mouth clicks, thumping, hissing, plosives, sibilance, background noise, static interference, clipping, and other artifacts.

    Only clean, production-grade audio makes it into the final dataset.

    Voice Artist Selection

    All voice artists are native English speakers with professional training or prior experience in narration. We ensure a diverse pool in terms of age, gender, and region to bring a balanced and rich vocal dataset.

    Artist Profile:
    Gender: Male and Female
    Age Range: 20–60 years
    Regions: Native English-speaking states from United States of America
    Selection Process: All artists are screened, onboarded, and sample-approved using FutureBeeAI’s proprietary Yugo platform.

    Script Quality & Coverage

    Scripts are not generic or repetitive. Scripts are professionally authored by domain experts to reflect real-world use cases. They avoid redundancy and include modern vocabulary, emotional range, and phonetically rich sentence structures.

    Word Count per Script: 3,000–5,000 words per 30-minute session
    Content Types:
    Storytelling
    Script and book reading
    Informational explainers
    Government service instructions
    E-commerce tutorials
    Motivational content
    Health & wellness guides
    Education & career advice
    Linguistic Design: Balanced punctuation, emotional range, modern syntax, and vocabulary diversity

    Transcripts & Alignment

    While the script is used during the recording, we also provide post-recording updates to ensure the transcript reflects the final spoken audio. Minor edits are made to adjust for skipped or rephrased words.

    Segmentation: Time-stamped at the sentence level, aligned to actual spoken delivery
    Format: Available in plain text and JSON
    Post-processing:
    Corrected for

  9. Census Data - Languages spoken in Chicago, 2008 – 2012

    • data.cityofchicago.org
    • healthdata.gov
    • +3more
    csv, xlsx, xml
    Updated Sep 12, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2014). Census Data - Languages spoken in Chicago, 2008 – 2012 [Dataset]. https://data.cityofchicago.org/Health-Human-Services/Census-Data-Languages-spoken-in-Chicago-2008-2012/a2fk-ec6q
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Sep 12, 2014
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    U.S. Census Bureau
    Area covered
    Chicago
    Description

    This dataset contains estimates of the number of residents aged 5 years or older in Chicago who “speak English less than very well,” by the non-English language spoken at home and community area of residence, for the years 2008 – 2012. See the full dataset description for more information at: https://data.cityofchicago.org/api/views/fpup-mc9v/files/dK6ZKRQZJ7XEugvUavf5MNrGNW11AjdWw0vkpj9EGjg?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\ECONOMIC_INDICATORS\Dataset_Description_Languages_2012_FOR_PORTAL_ONLY.pdf

  10. g

    PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 | gimi9.com

    • gimi9.com
    Updated Jul 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PHIDU - Birthplace - Non-English Speaking Residents (PHN) 2016 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_tua-phidu-phidu-birthplace-nes-residents-phn-2016-phn2017/
    Explore at:
    Dataset updated
    Jul 31, 2025
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    This dataset, released August 2017, contains the Australian residents population by their birthplace divided into English speaking (ES) and non-English speaking (NES) countries, 2016. The following countries are designated as ES: Canada, Ireland, New Zealand, South Africa, United Kingdom and the United States of America; the remaining countries are designated as NES. The dataset also includes the population people born overseas and report poor proficiency in English. The data is by Primary Health Network (PHN) 2017 geographic boundaries based on the 2016 Australian Statistical Geography Standard (ASGS). There are 31 PHNs set up by the Australian Government. Each network is controlled by a board of medical professionals and advised by a clinical council and community advisory committee. The boundaries of the PHNs closely align with the Local Hospital Networks where possible. For more information please see the data source notes on the data. Source: Compiled by PHIDU based on the ABS Census of Population and Housing, August 2016. AURIN has spatially enabled the original data. Data that was not shown/not applicable/not published/not available for the specific area ('#', '..', '^', 'np, 'n.a.', 'n.y.a.' in original PHIDU data) was removed.It has been replaced by by Blank cells. For other keys and abbreviations refer to PHIDU Keys.

  11. Department of Rehabilitation Office Contact Information and Addresses with...

    • data.chhs.ca.gov
    • data.ca.gov
    • +4more
    csv, docx, zip
    Updated Nov 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Rehabilitation (2025). Department of Rehabilitation Office Contact Information and Addresses with Languages Spoken [Dataset]. https://data.chhs.ca.gov/dataset/department-of-rehabilitation-office-contact-information-and-addresses-with-languages-spoken
    Explore at:
    docx, zip, csv(26997)Available download formats
    Dataset updated
    Nov 7, 2025
    Dataset provided by
    California Department of Rehabilitationhttp://www.dor.ca.gov/
    Authors
    Department of Rehabilitation
    Description

    This dataset is a list of Department of Rehabilitation (DOR) offices and includes contact information, addresses, and languages spoken in each office. Note: In addition to the languages listed, the DOR has various Bilingual language resources available in each office that allow us to serve members of the public who may speak a language other than English.

  12. t

    HISPANIC OR LATINO AND RACE - DP05_HIL_P - Dataset - CKAN

    • portal.tad3.org
    Updated Jul 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). HISPANIC OR LATINO AND RACE - DP05_HIL_P - Dataset - CKAN [Dataset]. https://portal.tad3.org/dataset/hispanic-or-latino-and-race-dp05_hil_p
    Explore at:
    Dataset updated
    Jul 23, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    ACS DEMOGRAPHIC AND HOUSING ESTIMATES HISPANIC OR LATINO AND RACE - DP05 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 The terms “Hispanic,” “Latino,” and “Spanish” are used interchangeably. Some respondents identify with all three terms while others may identify with only one of these three specific terms. People who identify with the terms “Hispanic,” “Latino,” or “Spanish” are those who classify themselves in one of the specific Hispanic, Latino, or Spanish categories listed on the questionnaire (“Mexican, Mexican Am., or Chicano,” “Puerto Rican,” or “Cuban”) as well as those who indicate that they are “another Hispanic, Latino, or Spanish origin.” People who do not identify with one of the specific origins listed on the questionnaire but indicate that they are “another Hispanic, Latino, or Spanish origin” are those whose origins are from Spain, the Spanish-speaking countries of Central or South America, or another Spanish culture or origin. Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before their arrival in the UnitedStates. People who identify their origin as Hispanic, Latino, or Spanish may be of any race.

  13. d

    ACS 5YR Demographic Estimate Data by State

    • datasets.ai
    • catalog.data.gov
    21, 57
    Updated Feb 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Housing and Urban Development (2024). ACS 5YR Demographic Estimate Data by State [Dataset]. https://datasets.ai/datasets/acs-5yr-demographic-estimate-data-by-state
    Explore at:
    57, 21Available download formats
    Dataset updated
    Feb 29, 2024
    Dataset authored and provided by
    Department of Housing and Urban Development
    Description

    2016-2020 ACS 5-Year estimates of demographic variables (see below) compiled at the State level. These variables include Sex By Age, Hispanic Or Latino Origin By Race, Household Type (Including Living Alone), Households By Presence Of People Under 18 Years By Household Type, Households By Presence Of People 60 Years And Over By Household Type, Nativity By Language Spoken At Home By Ability To Speak English For The Population 5 Years And Over, Average Household Size Of Occupied Housing Units By Tenure, and Sex by Educational Attainment for the Population 18 Years and Over.

  14. l

    ACS 5YR Demographic Estimate Data by State

    • data.lojic.org
    • hudgis-hud.opendata.arcgis.com
    • +1more
    Updated Aug 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Housing and Urban Development (2023). ACS 5YR Demographic Estimate Data by State [Dataset]. https://data.lojic.org/datasets/HUD::acs-5yr-demographic-estimate-data-by-state
    Explore at:
    Dataset updated
    Aug 21, 2023
    Dataset authored and provided by
    Department of Housing and Urban Development
    Area covered
    Description

    2016-2020 ACS 5-Year estimates of demographic variables (see below) compiled at the State level.The American Community Survey (ACS) 5 Year 2016-2020 demographic information is a subset of information available for download from the U.S. Census. Tables used in the development of this dataset include: B01001 - Sex By Age; B03002 - Hispanic Or Latino Origin By Race; B11001 - Household Type (Including Living Alone); B11005 - Households By Presence Of People Under 18 Years By Household Type; B11006 - Households By Presence Of People 60 Years And Over By Household Type; B16005 - Nativity By Language Spoken At Home By Ability To Speak English For The Population 5 Years And Over; B25010 - Average Household Size Of Occupied Housing Units By Tenure, and; B15001 - Sex by Educational Attainment for the Population 18 Years and Over; To learn more about the American Community Survey (ACS), and associated datasets visit: https://www.census.gov/programs-surveys/acs, for questions about the spatial attribution of this dataset, please reach out to us at GISHelpdesk@hud.gov. Data Dictionary: DD_ACS 5-Year Demographic Estimate Data by StateDate of Coverage: 2016-2020

  15. CATALISE: A Multinational and Multidisciplinary Delphi Consensus Study....

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh (2023). CATALISE: A Multinational and Multidisciplinary Delphi Consensus Study. Identifying Language Impairments in Children [Dataset]. http://doi.org/10.1371/journal.pone.0158753
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Delayed or impaired language development is a common developmental concern, yet there is little agreement about the criteria used to identify and classify language impairments in children. Children's language difficulties are at the interface between education, medicine and the allied professions, who may all adopt different approaches to conceptualising them. Our goal in this study was to use an online Delphi technique to see whether it was possible to achieve consensus among professionals on appropriate criteria for identifying children who might benefit from specialist services. We recruited a panel of 59 experts representing ten disciplines (including education, psychology, speech-language therapy/pathology, paediatrics and child psychiatry) from English-speaking countries (Australia, Canada, Ireland, New Zealand, United Kingdom and USA). The starting point for round 1 was a set of 46 statements based on articles and commentaries in a special issue of a journal focusing on this topic. Panel members rated each statement for both relevance and validity on a seven-point scale, and added free text comments. These responses were synthesised by the first two authors, who then removed, combined or modified items with a view to improving consensus. The resulting set of statements was returned to the panel for a second evaluation (round 2). Consensus (percentage reporting 'agree' or 'strongly agree') was at least 80 percent for 24 of 27 round 2 statements, though many respondents qualified their response with written comments. These were again synthesised by the first two authors. The resulting consensus statement is reported here, with additional summary of relevant evidence, and a concluding commentary on residual disagreements and gaps in the evidence base.

  16. c

    Census of Population and Housing, 2000: Summary File 3, Alabama

    • archive.ciser.cornell.edu
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of the Census (2024). Census of Population and Housing, 2000: Summary File 3, Alabama [Dataset]. http://doi.org/10.6077/rfnf-p929
    Explore at:
    Dataset updated
    Jun 1, 2024
    Dataset authored and provided by
    Bureau of the Census
    Variables measured
    HousingUnit, Individual
    Description

    Summary File 3 contains sample data, which is the information compiled from the questions asked of a sample of all people and housing units in the United States. Population items include basic population totals as well as counts for the following characteristics: urban and rural, households and families, marital status, grandparents as caregivers, language and ability to speak English, ancestry, place of birth, citizenship status, year of entry, migration, place of work, journey to work (commuting), school enrollment and educational attainment, veteran status, disability, employment status, industry, occupation, class of worker, income, and poverty status. Housing items include basic housing totals and counts for urban and rural, number of rooms, number of bedrooms, year moved into unit, household size and occupants per room, units in structure, year structure built, heating fuel, telephone service, plumbing and kitchen facilities, vehicles available, value of home, and monthly rent and shelter costs. The Summary File 3 population tables are identified with a "P" prefix and the housing tables are identified with an "H," followed by a sequential number. The "P" and "H" tables are shown for the block group and higher level geography, while the "PCT" and "HCT" tables are shown for the census tract and higher level geography. There are 16 "P" tables, 15 "PCT" tables, and 20 "HCT" tables that bear an alphabetic suffix on the table number, indicating that they are repeated for nine major race and Hispanic or Latino groups. There are 484 population tables and 329 housing tables for a total of 813 unique tables. (Source: downloaded from ICPSR 7/13/10)

    Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR13342.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.

  17. Professional group and nationality of panel members.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh (2023). Professional group and nationality of panel members. [Dataset]. http://doi.org/10.1371/journal.pone.0158753.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    D. V. M. Bishop; Margaret J. Snowling; Paul A. Thompson; Trisha Greenhalgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Professional group and nationality of panel members.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nexdata (2023). English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by Smartphone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1021
Organization logo

English Speech Dataset (Latin American Speakers) – 117 Hours Scripted Monologue by Smartphone

Explore at:
Dataset updated
Oct 13, 2023
Dataset authored and provided by
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description

This dataset contains 117 hours of English speech from Latin American speakers, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(281 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Search
Clear search
Close search
Google apps
Main menu