3 datasets found
  1. The most linguistically diverse countries worldwide 2025, by number of...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, The most linguistically diverse countries worldwide 2025, by number of languages [Dataset]. https://www.statista.com/statistics/1224629/the-most-linguistically-diverse-countries-worldwide-by-number-of-languages/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    Papua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.

  2. d

    Global English Speech with Accent Conversational Dataset — Multi-Region...

    • datarade.ai
    .wav
    Updated Jul 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2025). Global English Speech with Accent Conversational Dataset — Multi-Region Validated Speech with Gender, Age & Metadata for AI & NLP Training [Dataset]. https://datarade.ai/data-products/global-english-speech-with-accent-conversational-dataset-mu-filemarket
    Explore at:
    .wavAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset authored and provided by
    FileMarket
    Area covered
    Montenegro, Tonga, United States Minor Outlying Islands, Nicaragua, Comoros, Haiti, Cook Islands, Bangladesh, Yemen, Iceland
    Description

    The Global English Accent Conversational NLP Dataset is a comprehensive collection of validated English speech recordings sourced from native and non-native English speakers across key global regions. This dataset is designed for training Natural Language Processing models, conversational AI, Automatic Speech Recognition (ASR), and linguistic research, with a focus on regional accent variation.

    Regions and Covered Countries with Primary Spoken Languages:

    Africa: South Africa (English, Zulu, Afrikaans, Xhosa) Nigeria (English, Yoruba, Igbo, Hausa) Kenya (English, Swahili) Ghana (English, Twi, Ewe, Ga) Uganda (English, Luganda) Ethiopia (English, Amharic, Oromo)

    Central & South America: Mexico (Spanish, English as a second language) Guatemala (Spanish, K'iche', English) El Salvador (Spanish, English) Costa Rica (Spanish, English in Caribbean regions) Colombia (Spanish, English in urban centers) Dominican Republic (Spanish, English in tourist zones) Brazil (Portuguese, English in urban areas) Argentina (Spanish, English among educated speakers)

    Southeast Asia & South Asia: Philippines (Filipino, English) Vietnam (Vietnamese, English) Malaysia (Malay, English, Mandarin) Indonesia (Indonesian, Javanese, English) Singapore (English, Mandarin, Malay, Tamil) India (Hindi, English, Bengali, Tamil) Pakistan (Urdu, English, Punjabi)

    Europe: United Kingdom (English) Ireland (English, Irish) Germany (German, English) France (French, English) Spain (Spanish, Catalan, English) Italy (Italian, English) Portugal (Portuguese, English)

    Oceania: Australia (English) New Zealand (English, Māori) Fiji (English, Fijian) North America: United States (English, Spanish) Canada (English, French)

    Dataset Attributes: - Conversational English with natural accent variation - Global coverage with balanced male/female speakers - Rich speaker metadata: age, gender, country, city - Average audio length of ~30 minutes per participant - All samples manually validated for accuracy - Structured format suitable for machine learning and AI applications

    Best suited for: - NLP model training and evaluation - Multilingual ASR system development - Voice assistant and chatbot design - Accent recognition research - Voice synthesis and TTS modeling

    This dataset ensures global linguistic diversity and delivers high-quality audio for AI developers, researchers, and enterprises working on voice-based applications.

  3. Altitudinal data used for Kruskal-Wallis test.

    • plos.figshare.com
    txt
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonidas-Romanos Davranoglou; Leonidas Embirikos (2023). Altitudinal data used for Kruskal-Wallis test. [Dataset]. http://doi.org/10.1371/journal.pone.0283136.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Leonidas-Romanos Davranoglou; Leonidas Embirikos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The common toad (Bufo bufo) has been the subject of many folk tales and superstitions in Western Europe, and as a result, it is characterised by numerous common names (zoonyms). However, the zoonyms of the toad and its associated traditions have remained unexplored in the Balkans, one of Europe’s linguistic hotspots. In the present study, it was attempted to fill this knowledge gap by focusing on Greece, where more than 7.700 individuals were interviewed both in the field and through online platforms, in order to document toad zoonyms from all varieties and dialects of Greek, as well as local non-Greek languages such as Arvanitika, South Slavic dialects, and Vlach. It was found that the academically unattested zoonyms of the toad provide an unmatched and previously unexplored linguistic and ethnographic tool, as they reflect the linguistic, demographic, and historical processes that shaped modern Greece. This is particularly pertinent in the 21st century, when a majority of the country’s dialects and languages are in danger of imminent extinction–and some have already gone silent. Overall, the present study shows the significance of recording zoonyms of indigenous and threatened languages as excellent linguistic and ethnographic tools that safeguard our planet’s ethnolinguistic diversity and enhance our understanding on how pre-industrial communities interacted with their local fauna. Furthermore, in contrast to all other European countries, which only possess one or only a few zoonyms for the toad, the Greek world boasts an unmatched 37 zoonyms, which attest to its role as a linguistic hotspot.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista, The most linguistically diverse countries worldwide 2025, by number of languages [Dataset]. https://www.statista.com/statistics/1224629/the-most-linguistically-diverse-countries-worldwide-by-number-of-languages/
Organization logo

The most linguistically diverse countries worldwide 2025, by number of languages

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

Papua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.

Search
Clear search
Close search
Google apps
Main menu