100+ datasets found
  1. The most spoken languages worldwide 2025

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  2. Most spoken languages worldwide in Millions

    • kaggle.com
    zip
    Updated Oct 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Batros Jamali (2023). Most spoken languages worldwide in Millions [Dataset]. https://www.kaggle.com/datasets/batrosjamali/most-spoken-languages-worldwide-in-millions
    Explore at:
    zip(585 bytes)Available download formats
    Dataset updated
    Oct 14, 2023
    Authors
    Batros Jamali
    Area covered
    World
    Description

    Dataset

    This dataset was created by Batros Jamali

    Contents

  3. Common languages used for web content 2025, by share of websites

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2025
    Area covered
    Worldwide
    Description

    As of October 2025, English was the dominant language for online content, used by nearly half of all websites worldwide. Spanish ranked second, accounting for around 6 percent of web content, followed by German with 5.9 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  4. 🌍📚 World Languages Dataset 🌍📚

    • kaggle.com
    zip
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waqar Ali (2024). 🌍📚 World Languages Dataset 🌍📚 [Dataset]. https://www.kaggle.com/datasets/waqi786/world-languages-dataset
    Explore at:
    zip(5706 bytes)Available download formats
    Dataset updated
    Jul 30, 2024
    Authors
    Waqar Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    World
    Description

    This dataset provides a comprehensive overview of 500 languages spoken around the world. It captures essential linguistic features, including language families, geographical regions, writing systems, and the estimated number of native speakers. This dataset aims to highlight the rich diversity of languages and their cultural significance, offering valuable insights for linguists, researchers, and enthusiasts interested in global language distribution.

    The dataset contains real and accurate records for 500 languages across different regions and linguistic families. It covers a diverse range of languages, from widely spoken ones like English and Mandarin to less commonly known languages. The data was meticulously compiled to reflect the authentic linguistic landscape and provide a valuable resource for language studies and cultural analysis.

  5. g

    ENGLISH PROFICIENCY LEVEL

    • global-relocate.com
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global Relocate (2024). ENGLISH PROFICIENCY LEVEL [Dataset]. https://global-relocate.com/rankings/english-proficiency-level
    Explore at:
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Global Relocate
    Description

    Using data from reports such as the "English Proficiency Index" (EDU) from Education First, one can see the significant impact of culture, education and globalization on the ability of citizens of different countries to speak English.

  6. Level of English proficiency Asia 2024, by country

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Level of English proficiency Asia 2024, by country [Dataset]. https://www.statista.com/statistics/1456015/asia-english-proficiency-ranking-by-country/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Asia, APAC, Asia
    Description

    Singapore scored 609 out of a maximum of 800 points in the English Proficiency Index 2024, the highest score across the selected Asian countries and territories. In contrast, Cambodia reached an English Proficiency Index score of 408 that year.

  7. Number of native Spanish speakers worldwide 2024, by country

    • hazel.com.ua
    • monwebsite.ch
    • +5more
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://hazel.com.ua/?p=2385236
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

  8. Ranking of languages spoken at home in the U.S. 2024, by number of speakers

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Ranking of languages spoken at home in the U.S. 2024, by number of speakers [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    United States
    Description

    In 2024, some 45 million people in the United States spoke Spanish at home. In comparison, the second most spoken non-English language spoken by households was Chinese, at just 3.7 million speakers.The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  9. F

    American English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for US English.
    Voice Assistants: Build smart assistants capable of understanding natural American conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  10. Share of U.S. population speaking a language besides English at home 2023,...

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of U.S. population speaking a language besides English at home 2023, by state [Dataset]. https://www.statista.com/statistics/312940/share-of-us-population-speaking-a-language-other-than-english-at-home-by-state/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.

  11. v

    English Language Training (ELT) Market Size By Product Type (English for...

    • verifiedmarketresearch.com
    pdf,excel,csv,ppt
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verified Market Research (2025). English Language Training (ELT) Market Size By Product Type (English for Academic Purposes, English as a Foreign Language, English for Speakers of Other Languages, English as an Additional Language, English as a Second Language, English for Specific Purposes), By Application (White-collar Workers, Students, Migrants, Travelers, and Job Seekers), By End-User (Educational Institutions, Corporate Sector, Government Organizations, and Individuals), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/english-language-training-elt-market/
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Oct 23, 2025
    Dataset authored and provided by
    Verified Market Research
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    The English Language Training (ELT) Market size was valued at USD 80 Billion in 2024 and is projected to reach USD 148.07 Billion by 2032, growing at a CAGR of 8% during the forecast period. i.e., 2026-2032.English language proficiency requirements are driving unprecedented demand for ELT services as students prepare for overseas education. According to UNESCO Institute for Statistics, international student enrollment reached 6.4 million in 2022, with projections indicating continued growth through 2025. Moreover, standardized tests like IELTS and TOEFL remain mandatory for university admissions across English-speaking countries. This creates sustained demand for structured language training programs worldwide.

  12. r

    International Corpus of English (ICE)

    • researchdata.edu.au
    • figshare.mq.edu.au
    Updated Dec 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pam Peters; Adam Smith (2023). International Corpus of English (ICE) [Dataset]. http://doi.org/10.25949/24769173.V1
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Macquarie University
    Authors
    Pam Peters; Adam Smith
    Description

    The Australian component of the International Corpus of English (ICE-AUS) is an approximately one million word corpus of transcribed spoken and written Australian English from 1992-1995. It consists of 500 samples of Australian English (60% speech, 40% writing) that matches the structure of other ICE corpora (associated with the International corpus of English). The spoken data includes transcriptions of face-to face spoken conversations, telephone conversations, monologues, broadcast dialogues, and scripted speech. The written texts include samples of unpublished letters (personal and professional), student essays, newspaper writing, popular nonfiction, academic writing, and fiction.This collection was previously accessible online via the Australian National Corpus (AusNC), an initiative managed by Griffith University between 2012 and 2023.

  13. E

    English Speaking Practice Platform Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). English Speaking Practice Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/english-speaking-practice-platform-1433116
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jan 1, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The English Speaking Practice Platform market is estimated to be valued at $2392 million in 2025, with a projected CAGR of 10.3% from 2025 to 2033. Key market drivers include the increasing demand for English language proficiency in various sectors, the growing adoption of online learning platforms, and the rise of globalization and international business. The market is segmented by application (students with or without a foundation in English), type (oral teaching videos, real tutor teaching, oral practitioners' groups, and others), and region. North America is expected to hold the largest market share, followed by Europe and Asia Pacific. Key market participants include Italki, Preply, Language Exchanges, Easy Language Exchange, Coffeestrap, LingoGlobe, Hallo, Hello Shraa, HelloTalk, Magoosh, Udemy, TestDEN, Jaime Miller Advising, The Princeton Review, Kaplan, TestGlider, Superlearn, Skyengtest, EPrepz, New Oriental Education & Technology Group Inc, and Small Station Education.

  14. F

    Canadian English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Canadian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Canadian English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Canadian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Canadian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Canadian English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Canada to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Canadian English.
    Voice Assistants: Build smart assistants capable of understanding natural Canadian conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  15. D

    English Language Learning Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). English Language Learning Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-english-language-learning-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 23, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    English Language Learning Market Outlook



    The English Language Learning market size was valued at USD 50 billion in 2023 and is projected to reach USD 100 billion by 2032, growing at a CAGR of 8%. This significant growth is driven by the increasing globalization and the rising necessity of English proficiency in both academic and professional spheres. The adoption of digital learning tools, the rise of e-learning platforms, and the growing emphasis on cross-border educational exchanges are some of the pivotal factors propelling the market's expansion.



    One of the primary growth factors in the English Language Learning market is the widespread use of the English language as the global lingua franca. In an increasingly interconnected world, proficiency in English is often seen as a critical skill for career advancement, international business, and academic success. English is the dominant language of the internet, international law, and global commerce, making it an indispensable tool for communication. Consequently, there is a growing demand for English language learning programs among students, professionals, and migrants seeking better opportunities.



    The proliferation of digital technologies and online learning platforms has also significantly contributed to the market's growth. The convenience and flexibility offered by digital learning tools enable learners to access high-quality English language courses from anywhere in the world, at any time. Interactive apps, virtual classrooms, and AI-powered language learning software have made English learning more engaging and personalized. This technological advancement is particularly beneficial for non-native speakers in remote or underserved regions who may not have access to traditional classroom settings.



    Another driving factor is the increasing investment in education by governments and private sectors worldwide. Many countries recognize the importance of English proficiency in fostering economic growth and competitiveness on the global stage. Consequently, there are numerous initiatives aimed at integrating English language learning into national education systems. The expansion of international student exchange programs and scholarships also promotes the learning of English as a second language. These efforts are bolstered by collaborations between educational institutions, technology providers, and language training centers.



    Regionally, Asia Pacific is anticipated to witness the highest growth in the English Language Learning market. Countries such as China, India, and Japan are investing heavily in English education to enhance their global competitiveness. The region's vast population and growing middle class, coupled with the emphasis on English as a key skill for academic and professional success, drive this growth. North America and Europe also hold significant market shares, driven by the continuous influx of immigrants and international students, along with the strong presence of leading educational technology companies. Latin America and the Middle East & Africa, although smaller in market size, are experiencing steady growth due to increasing awareness and investment in English education.



    Product Type Analysis



    The English Language Learning market is segmented by product type into Digital Learning, Classroom Learning, and Blended Learning. Digital Learning, which encompasses online courses, mobile applications, and software-based learning tools, is the fastest-growing segment. The rise of e-learning platforms like Duolingo, Babbel, and Rosetta Stone has revolutionized how people learn English. These platforms offer interactive and engaging content, often powered by artificial intelligence to provide personalized learning experiences. The convenience of accessing learning materials from any location and the cost-effectiveness of digital platforms make this segment highly attractive, especially for self-directed learners and working professionals.



    Classroom Learning remains a vital segment, particularly in regions where traditional education systems are deeply ingrained. This segment includes structured courses offered by schools, colleges, language institutes, and private tutors. Despite the surge in digital learning, many learners still prefer the structured environment and direct interaction with instructors that classroom settings provide. This segment is particularly robust in regions with well-established educational institutions and where cultural attitudes towards traditional education are prevalent. Furthermore, classroom learning often includes immersive experiences and peer inte

  16. English Gaming speech dataset

    • kaggle.com
    zip
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). English Gaming speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/english-gaming-speech-dataset
    Explore at:
    zip(698585 bytes)Available download formats
    Dataset updated
    Jun 7, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    English Gaming Real-world Casual Conversation and Monologue speech dataset

    Description

    English Gaming Real-world Casual Conversation and Monologue speech dataset, covers spontaneous dialogue about popular and evergreen games, including player discussions on battle strategies, social interactions, esports news, etc., mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, accent, offensive expression labeling and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link:https://www.nexdata.ai/datasets/speechrecog/1430?source=Kaggle

    Format

    16k Hz, 16 bit, wav, mono channel;

    Content category

    Spontaneous dialogue or monologue about popular and evergreen games (such as FPS, MOBA, MMORPG, VR, and other gaming genres), including player discussions on battle strategies, social interactions, esports news, etc.

    Recording environment

    Mixed(indoor, outdoor,entertainment)

    Country

    the United Kingdom(GBR), the United States(USA), etc.

    Language(Region) Code

    en-GB,en-US, etc.;

    Language

    English;

    Features of annotation

    Transcription text, timestamp, offensive expression labeling, speaker ID, gender, noise;

    Accuracy Rate

    Sentence Accuracy Rate (SAR) 95%.

    Licensing Information

    Commercial License

  17. c

    Global English Proficiency Test market size is USD 2965.5 million in 2024.

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). Global English Proficiency Test market size is USD 2965.5 million in 2024. [Dataset]. https://www.cognitivemarketresearch.com/english-proficiency-test-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Sep 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global English Proficiency Test market size is USD 2965.5 million in 2024. It will expand at a compound annual growth rate (CAGR) of 9.70% from 2024 to 2031.

    North America held the major market share for more than 40% of the global revenue with a market size of USD 1186.20 million in 2024 and will grow at a compound annual growth rate (CAGR) of 7.9% from 2024 to 2031.
    Europe accounted for a market share of over 30% of the global revenue with a market size of USD 889.65 million.
    Asia Pacific held a market share of around 23% of the global revenue with a market size of USD 682.07 million in 2024 and will grow at a compound annual growth rate (CAGR) of 11.7% from 2024 to 2031.
    Latin America had a market share for more than 5% of the global revenue with a market size of USD 148.28 million in 2024 and will grow at a compound annual growth rate (CAGR) of 9.1% from 2024 to 2031.
    Middle East and Africa hada market share of around 2% of the global revenue and was estimated at a market size of USD 59.31 million in 2024 and will grow at a compound annual growth rate (CAGR) of 9.4% from 2024 to 2031.
    Employers category is experiencing the fastest growth in the English Proficiency Test Market. The rising trend of multinational companies and the global nature of business operations necessitate a workforce proficient in English.
    

    Market Dynamics of English Proficiency Test Market

    Key Drivers for English Proficiency Test Market

    Increasing Global Mobility of Students and Professionals to Increase the Demand Globally
    

    The increasing global mobility of students and professionals is a significant driver in the English Proficiency Test Market. As educational institutions worldwide, particularly in English-speaking countries, attract a growing number of international students, the need for standardized English proficiency assessments becomes critical. Similarly, professionals seeking employment opportunities in multinational corporations or pursuing career advancements in global markets must demonstrate their English language capabilities. This trend is fueled by globalization and the widespread recognition of English as the lingua franca of business, academia, and technology, thereby boosting the demand for reliable and comprehensive English proficiency tests.

    Rising Demand for English in Non-English Speaking Regions to Propel Market Growth
    

    The rising demand for English language skills in non-English speaking regions is another crucial driver of the English Proficiency Test Market. As countries in Asia, Latin America, and Europe increasingly integrate into the global economy, proficiency in English becomes a valuable asset for individuals and businesses. Governments and educational systems in these regions are incorporating English language education into their curricula, and companies are investing in language training for their employees to enhance competitiveness. This growing emphasis on English proficiency is creating substantial opportunities for test providers to expand their offerings and cater to a broader audience, further propelling market growth.

    Restraint Factor for the English Proficiency Test Market

    High Cost of Test Preparation and Registration Fees to Limit the Sales
    

    A significant restraint in the English Proficiency Test Market is the high cost of test preparation and registration fees. Many potential test-takers, especially students and professionals from developing countries, find these costs prohibitive. The expense of preparatory courses, study materials, and the tests themselves can deter individuals from taking the exams, limiting their opportunities for education and employment in English-speaking regions. This financial barrier not only affects individuals but also impacts the overall market growth, as it reduces the number of people who can afford to demonstrate their English proficiency through standardized tests.

    Limited Accessibility in Rural and Remote Regions
    

    One key restraint in the English proficiency test market is the limited availability of authorized test centers and digital infrastructure in rural and remote areas. Many prospective candidates, especially in developing countries, face challenges in reaching testing locations or accessing reliable internet for online examinations. This restricts participation and limits the market’s growth potential in under...

  18. F

    British English Call Center Data for BFSI AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). British English Call Center Data for BFSI AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/bfsi-call-center-conversation-english-uk
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United Kingdom
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This UK English Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native UK English speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.

    Participant Diversity:
    Speakers: 60 native UK English speakers from our verified contributor pool.
    Regions: Representing multiple provinces across United Kingdom to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.

    Inbound Calls:
    Debit Card Block Request
    Transaction Disputes
    Loan Enquiries
    Credit Card Billing Issues
    Account Closure & Claims
    Policy Renewals & Cancellations
    Retirement & Tax Planning
    Investment Risk Queries, and more
    Outbound Calls:
    Loan & Credit Card Offers
    Customer Surveys
    EMI Reminders
    Policy Upgrades
    Insurance Follow-ups
    Investment Opportunity Calls
    Retirement Planning Reviews, and more

    This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    30 hours-coded Segments
    Non-speech Tags (e.g., pauses, background noise)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making financial domain model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent,

  19. E

    ICE-GB (British English component of the International Corpus of English)

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Dec 20, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2012). ICE-GB (British English component of the International Corpus of English) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-W0021/
    Explore at:
    Dataset updated
    Dec 20, 2012
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    ICE-GB is the British component of the International Corpus of English (ICE). ICE began in 1990 with the primary aim of providing material for comparative studies of varieties of English throughout the world. Twenty centres around the world are preparing corpora of their own national or regional variety of English.ICE-GB is fully grammatically analysed. Like all the ICE corpora, ICE-GB consists of a million words of spoken and written English and adheres to the common corpus design. 200 written and 300 spoken texts make up the million words. Every text is grammatically annotated, allowing complex and detailed searches across the whole corpus. ICE-GB contains 83,394 parse trees, including 59,640 in the spoken part of the corpus.ICE-GB has been fully checked. It was checked by linguists at several stages in its completion, using both a traditional 'post-checking' strategy and also by cross-sectional error-based searches. ICE-GB is distributed with the retrieval software ICECUP (the International Corpus of English Corpus Utility Program). ICECUP supports a variety of query types, including the use of the parse analyses to construct Fuzzy Tree Fragments to search the corpus.

  20. MCB_languages_county

    • kaggle.com
    zip
    Updated Oct 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marisol Brewster (2019). MCB_languages_county [Dataset]. https://www.kaggle.com/mcbrewster/mcb-languages-county
    Explore at:
    zip(414833 bytes)Available download formats
    Dataset updated
    Oct 1, 2019
    Authors
    Marisol Brewster
    Description

    Context

    This is a dataset I found online through the Google Dataset Search portal.

    Content

    The American Community Survey (ACS) 2009-2013 multi-year data are used to list all languages spoken in the United States that were reported during the sample period. These tables provide detailed counts of many more languages than the 39 languages and language groups that are published annually as a part of the routine ACS data release. This is the second tabulation beyond 39 languages since ACS began.

    The tables include all languages that were reported in each geography during the 2009 to 2013 sampling period. For the purpose of tabulation, reported languages are classified in one of 380 possible languages or language groups. Because the data are a sample of the total population, there may be languages spoken that are not reported, either because the ACS did not sample the households where those languages are spoken, or because the person filling out the survey did not report the language or reported another language instead.

    The tables also provide information about self-reported English-speaking ability. Respondents who reported speaking a language other than English were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all." The data on ability to speak English represent the person’s own perception about his or her own ability or, because ACS questionnaires are usually completed by one household member, the responses may represent the perception of another household member.

    These tables are also available through the Census Bureau's application programming interface (API). Please see the developers page for additional details on how to use the API to access these data.

    Acknowledgements

    Sources:

    Google Dataset Search: https://toolbox.google.com/datasetsearch

    2009-2013 American Community Survey

    Original dataset: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html

    Downloaded From: https://data.world/kvaughn/languages-county

    Banner and thumbnail photo by Farzad Mohsenvand on Unsplash

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista, The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2025

Explore at:
464 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

Search
Clear search
Close search
Google apps
Main menu