3 datasets found

The most linguistically diverse countries worldwide 2025, by number of...
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, The most linguistically diverse countries worldwide 2025, by number of languages [Dataset]. https://www.statista.com/statistics/1224629/the-most-linguistically-diverse-countries-worldwide-by-number-of-languages/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description
Papua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.
d
Global English Speech with Accent Conversational Dataset — Multi-Region...
datarade.ai
.wav
Updated Jul 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FileMarket (2025). Global English Speech with Accent Conversational Dataset — Multi-Region Validated Speech with Gender, Age & Metadata for AI & NLP Training [Dataset]. https://datarade.ai/data-products/global-english-speech-with-accent-conversational-dataset-mu-filemarket
Explore at:
.wavAvailable download formats
Dataset updated
Jul 21, 2025
Dataset authored and provided by
FileMarket
Area covered
Montenegro, Tonga, United States Minor Outlying Islands, Nicaragua, Comoros, Haiti, Cook Islands, Bangladesh, Yemen, Iceland
Description
The Global English Accent Conversational NLP Dataset is a comprehensive collection of validated English speech recordings sourced from native and non-native English speakers across key global regions. This dataset is designed for training Natural Language Processing models, conversational AI, Automatic Speech Recognition (ASR), and linguistic research, with a focus on regional accent variation.

Regions and Covered Countries with Primary Spoken Languages:

Africa: South Africa (English, Zulu, Afrikaans, Xhosa) Nigeria (English, Yoruba, Igbo, Hausa) Kenya (English, Swahili) Ghana (English, Twi, Ewe, Ga) Uganda (English, Luganda) Ethiopia (English, Amharic, Oromo)

Central & South America: Mexico (Spanish, English as a second language) Guatemala (Spanish, K'iche', English) El Salvador (Spanish, English) Costa Rica (Spanish, English in Caribbean regions) Colombia (Spanish, English in urban centers) Dominican Republic (Spanish, English in tourist zones) Brazil (Portuguese, English in urban areas) Argentina (Spanish, English among educated speakers)

Southeast Asia & South Asia: Philippines (Filipino, English) Vietnam (Vietnamese, English) Malaysia (Malay, English, Mandarin) Indonesia (Indonesian, Javanese, English) Singapore (English, Mandarin, Malay, Tamil) India (Hindi, English, Bengali, Tamil) Pakistan (Urdu, English, Punjabi)

Europe: United Kingdom (English) Ireland (English, Irish) Germany (German, English) France (French, English) Spain (Spanish, Catalan, English) Italy (Italian, English) Portugal (Portuguese, English)

Oceania: Australia (English) New Zealand (English, Māori) Fiji (English, Fijian) North America: United States (English, Spanish) Canada (English, French)

Dataset Attributes: - Conversational English with natural accent variation - Global coverage with balanced male/female speakers - Rich speaker metadata: age, gender, country, city - Average audio length of ~30 minutes per participant - All samples manually validated for accuracy - Structured format suitable for machine learning and AI applications

Best suited for: - NLP model training and evaluation - Multilingual ASR system development - Voice assistant and chatbot design - Accent recognition research - Voice synthesis and TTS modeling

This dataset ensures global linguistic diversity and delivers high-quality audio for AI developers, researchers, and enterprises working on voice-based applications.
Altitudinal data used for Kruskal-Wallis test.
plos.figshare.com
txt
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonidas-Romanos Davranoglou; Leonidas Embirikos (2023). Altitudinal data used for Kruskal-Wallis test. [Dataset]. http://doi.org/10.1371/journal.pone.0283136.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0283136.s002
Dataset updated
Jun 6, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Leonidas-Romanos Davranoglou; Leonidas Embirikos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The common toad (Bufo bufo) has been the subject of many folk tales and superstitions in Western Europe, and as a result, it is characterised by numerous common names (zoonyms). However, the zoonyms of the toad and its associated traditions have remained unexplored in the Balkans, one of Europe’s linguistic hotspots. In the present study, it was attempted to fill this knowledge gap by focusing on Greece, where more than 7.700 individuals were interviewed both in the field and through online platforms, in order to document toad zoonyms from all varieties and dialects of Greek, as well as local non-Greek languages such as Arvanitika, South Slavic dialects, and Vlach. It was found that the academically unattested zoonyms of the toad provide an unmatched and previously unexplored linguistic and ethnographic tool, as they reflect the linguistic, demographic, and historical processes that shaped modern Greece. This is particularly pertinent in the 21st century, when a majority of the country’s dialects and languages are in danger of imminent extinction–and some have already gone silent. Overall, the present study shows the significance of recording zoonyms of indigenous and threatened languages as excellent linguistic and ethnographic tools that safeguard our planet’s ethnolinguistic diversity and enhance our understanding on how pre-industrial communities interacted with their local fauna. Furthermore, in contrast to all other European countries, which only possess one or only a few zoonyms for the toad, the Greek world boasts an unmatched 37 zoonyms, which attest to its role as a linguistic hotspot.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista, The most linguistically diverse countries worldwide 2025, by number of languages [Dataset]. https://www.statista.com/statistics/1224629/the-most-linguistically-diverse-countries-worldwide-by-number-of-languages/

The most linguistically diverse countries worldwide 2025, by number of languages

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2025

Area covered

World

Description

Papua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.

Clear search

Close search

Google apps

Main menu

The most linguistically diverse countries worldwide 2025, by number of...

Global English Speech with Accent Conversational Dataset — Multi-Region...

Altitudinal data used for Kruskal-Wallis test.

The most linguistically diverse countries worldwide 2025, by number of languages