Facebook
TwitterPapua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.
Facebook
TwitterThe Global English Accent Conversational NLP Dataset is a comprehensive collection of validated English speech recordings sourced from native and non-native English speakers across key global regions. This dataset is designed for training Natural Language Processing models, conversational AI, Automatic Speech Recognition (ASR), and linguistic research, with a focus on regional accent variation.
Regions and Covered Countries with Primary Spoken Languages:
Africa: South Africa (English, Zulu, Afrikaans, Xhosa) Nigeria (English, Yoruba, Igbo, Hausa) Kenya (English, Swahili) Ghana (English, Twi, Ewe, Ga) Uganda (English, Luganda) Ethiopia (English, Amharic, Oromo)
Central & South America: Mexico (Spanish, English as a second language) Guatemala (Spanish, K'iche', English) El Salvador (Spanish, English) Costa Rica (Spanish, English in Caribbean regions) Colombia (Spanish, English in urban centers) Dominican Republic (Spanish, English in tourist zones) Brazil (Portuguese, English in urban areas) Argentina (Spanish, English among educated speakers)
Southeast Asia & South Asia: Philippines (Filipino, English) Vietnam (Vietnamese, English) Malaysia (Malay, English, Mandarin) Indonesia (Indonesian, Javanese, English) Singapore (English, Mandarin, Malay, Tamil) India (Hindi, English, Bengali, Tamil) Pakistan (Urdu, English, Punjabi)
Europe: United Kingdom (English) Ireland (English, Irish) Germany (German, English) France (French, English) Spain (Spanish, Catalan, English) Italy (Italian, English) Portugal (Portuguese, English)
Oceania: Australia (English) New Zealand (English, Māori) Fiji (English, Fijian) North America: United States (English, Spanish) Canada (English, French)
Dataset Attributes: - Conversational English with natural accent variation - Global coverage with balanced male/female speakers - Rich speaker metadata: age, gender, country, city - Average audio length of ~30 minutes per participant - All samples manually validated for accuracy - Structured format suitable for machine learning and AI applications
Best suited for: - NLP model training and evaluation - Multilingual ASR system development - Voice assistant and chatbot design - Accent recognition research - Voice synthesis and TTS modeling
This dataset ensures global linguistic diversity and delivers high-quality audio for AI developers, researchers, and enterprises working on voice-based applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The common toad (Bufo bufo) has been the subject of many folk tales and superstitions in Western Europe, and as a result, it is characterised by numerous common names (zoonyms). However, the zoonyms of the toad and its associated traditions have remained unexplored in the Balkans, one of Europe’s linguistic hotspots. In the present study, it was attempted to fill this knowledge gap by focusing on Greece, where more than 7.700 individuals were interviewed both in the field and through online platforms, in order to document toad zoonyms from all varieties and dialects of Greek, as well as local non-Greek languages such as Arvanitika, South Slavic dialects, and Vlach. It was found that the academically unattested zoonyms of the toad provide an unmatched and previously unexplored linguistic and ethnographic tool, as they reflect the linguistic, demographic, and historical processes that shaped modern Greece. This is particularly pertinent in the 21st century, when a majority of the country’s dialects and languages are in danger of imminent extinction–and some have already gone silent. Overall, the present study shows the significance of recording zoonyms of indigenous and threatened languages as excellent linguistic and ethnographic tools that safeguard our planet’s ethnolinguistic diversity and enhance our understanding on how pre-industrial communities interacted with their local fauna. Furthermore, in contrast to all other European countries, which only possess one or only a few zoonyms for the toad, the Greek world boasts an unmatched 37 zoonyms, which attest to its role as a linguistic hotspot.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterPapua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.