7 datasets found

h
CulturalGround
huggingface.co
Updated Aug 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NeuLab @ LTI/CMU (2025). CulturalGround [Dataset]. https://huggingface.co/datasets/neulab/CulturalGround
Explore at:
Dataset updated
Aug 22, 2025
Dataset authored and provided by
NeuLab @ LTI/CMU
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CulturalGround: Grounding Multilingual Multimodal LLMs With Cultural Knowledge

🌍 🇩🇪 🇫🇷 🇬🇧 🇪🇸 🇮🇹 🇵🇱 🇷🇺 🇨🇿 🇯🇵 🇺🇦 🇧🇷 🇮🇳 🇨🇳 🇳🇴 🇵🇹 🇮🇩 🇮🇱 🇹🇷 🇬🇷 🇷🇴 🇮🇷 🇹🇼 🇲🇽 🇮🇪 🇰🇷 🇧🇬 🇹🇭 🇳🇱 🇪🇬 🇵🇰 🇳🇬 🇮🇩 🇻🇳 🇲🇾 🇸🇦 🇮🇩 🇧🇩 🇸🇬 🇱🇰 🇰🇪 🇲🇳 🇪🇹 🇹🇿 🇷🇼 🏠 Homepage | 🤖 CulturalPangea-7B | 📊 CulturalGround | 💻 Github | 📄 Arxiv

We introduce CulturalGround, a large-scale cultural VQA dataset and a pipeline for creating cultural… See the full description on the dataset page: https://huggingface.co/datasets/neulab/CulturalGround.
h
TaiwanVQA
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hen-Hsen Huang (2025). TaiwanVQA [Dataset]. https://huggingface.co/datasets/hhhuang/TaiwanVQA
Explore at:
Dataset updated
May 29, 2025
Authors
Hen-Hsen Huang
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
TaiwanVQA: Benchmarking and Enhancing Cultural Understanding in Vision-Language Models

Dataset Summary

TaiwanVQA is a visual question answering (VQA) benchmark designed to evaluate the capability of vision-language models (VLMs) in recognizing and reasoning about culturally specific content related to Taiwan. This dataset contains 2,736 images captured by our team, paired with 5,472 manually designed questions that cover diverse topics from daily life in Taiwan… See the full description on the dataset page: https://huggingface.co/datasets/hhhuang/TaiwanVQA.
Data from: Amharic visual question answering on Ethiopian tourism
zenodo.org
bin
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
alazar alazar; alazar; alazar alazar; alazar (2024). Amharic visual question answering on Ethiopian tourism [Dataset]. http://doi.org/10.5281/zenodo.13941892
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13941892
Dataset updated
Oct 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
alazar alazar; alazar; alazar alazar; alazar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2024
Area covered
Ethiopia
Description
Visual Question Answering (VQA) is a Vision-to-Text (V2T) task that integrates visual
features of images with natural language questions to generate meaningful responses.
Most existing research has focused on English, leaving a significant gap for other
languages, including Amharic. Tourism, a major global industry, relies heavily on
interactions where visitors seek information about natural, historical, cultural, and
religious sites. Ethiopia is a remarkable tourist destination, home to unique sites such as
the Rock-hewn churches of Lalibela and the Castles of Gondar, as well as natural
phenomena like Simien National Park and Lake Tana. Most visitors are local, creating an
urgent need for a VQA model that can deliver accurate, culturally relevant information in
Amharic. Unfortunately, no such model currently exists to assist tourists at these heritage
sites. This research addresses this gap by developing an Amharic Visual Question
Answering model specifically tailored for Ethiopian tourism. A new Amharic VQA
dataset was created using 2,200 diverse images from Ethiopian tourist sites paired with
6,600 questions in Amharic, covering natural landmarks, historical sites, and religious
celebrations. Our dataset is collected from various sources, including the UNICCO
website, the Amhara Tourism office, and online platforms such as Facebook, Free pixel,
and Instagram. Each image is complemented by three corresponding questions
formulated by three individual experts and answered by ten candidates. The questions,
answers, and images are linked through annotations and fed into the model. We used
ResNet-50 for feature extraction and Bidirectional Gated Recurrent Unit (BiGRU) with
attention mechanisms, achieving a testing accuracy of 54.98%, demonstrating the model's
effectiveness in answering questions about Ethiopian heritage. We will expand this
research using external knowledge to gat answer and description beyond image and
custom object detection
h
FoodieQA
huggingface.co
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wenyan li (2024). FoodieQA [Dataset]. https://huggingface.co/datasets/lyan62/FoodieQA
Explore at:
Dataset updated
Oct 2, 2024
Authors
wenyan li
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

FoodieQA is a benchmark including Multi-image, Single-image VQA, and textQA questions about regional Chinese food. Built upon 389 unique food images on 350 unique food entries. The food images are collected from individual volunteers and not from the web to ensure evaluation fairness, specifically designed to evaluate the VLMs' capability on fine-grained understanding of Chinese food culture.… See the full description on the dataset page: https://huggingface.co/datasets/lyan62/FoodieQA.
F
French Visual Question-Answer Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). French Visual Question-Answer Dataset [Dataset]. http://doi.org/10.5281/zenodo.5083737
Explore at:
wavAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5083737
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
French
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the French Language Visual Question Answer Dataset. The dataset includes 5000 diverse images and total 35000+ question-answer pairs associated with it. This meticulously curated dataset advances AI models for multimodal data understanding and development of French language visual question-answering (VQA) models.
Image Data
This image question-answer training dataset comprises over 5,000 high-resolution images across diverse categories and scenes. Each image is carefully selected to represent a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust VQA models.
•
Image Data Information: The images in this dataset were collected through a rigorous and ethical process.

•
Clarity: Each image was checked for visual clarity and appropriateness.

•
Relevance: Images were selected based on their relevance to language dominance region and potential VQA scenarios, ensuring they depict a wide range of real-world contexts and objects.

•
Copyright-Free: The images in the dataset are free from any copyright issues.

•
Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.

•
Type: Dataset contains images that have graphical as well as textual content in it.

•
Categories and Topics: The dataset spans a wide range of categories and topics to ensure thorough training, fine-tuning, and testing of VQA models. Topics include:

•
Daily Life: Images about household objects, activities, and daily routines.

•
Nature and Environment: Images related to natural scenes, plants, animals, and weather.

•
Technology and Gadgets: Images about electronic devices, tools, and machinery.

•
Human Activities: Images about people, their actions, professions, and interactions.

•
Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.

•
Food and Dining: Images about different foods, meals, and dining settings.

•
Education: Images related to educational settings, materials, and activities.

•
Sports and Recreation: Images about various sports, games, and recreational activities.

•
Transportation: Images about vehicles, travel methods, and transportation infrastructure.

•
Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

Question and Answer Pairs
The dataset includes more than 35,000 French-language question and answer pairs, which means around 7-10 question answers for each image. It is thoughtfully crafted to cover various levels of complexity and types of questions. These pairs are designed to test and improve the model's ability to understand and respond to visual inputs in natural language.
•
Types of Questions: The dataset includes a diverse set of question types to ensure comprehensive model training:

•
Descriptive Questions: These questions seek detailed descriptions of objects, people, or scenes within the image.

•
Counting Questions: These questions involve counting the number of specific objects or elements present in the image.

•
Yes/No Questions: These questions require a binary yes or no answer based on the visual content.

•
Location-Based Questions: These questions focus on identifying the location of objects or elements within the image.

<div style="display: flex; gap: 16px; align-items:
h
chitroJera
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moshi, chitroJera [Dataset]. https://huggingface.co/datasets/pltops/chitroJera
Explore at:
Authors
Moshi
Description
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla

ChitroJera is the first large-scale Visual Question Answering (VQA) dataset for Bangla, designed to capture regionally relevant cultural, linguistic, and visual contexts. It enables research on multimodal learning in low-resource languages and encourages the development of AI systems tailored to South Asian contexts.

📑 Paper

If you use ChitroJera in your research, please cite:… See the full description on the dataset page: https://huggingface.co/datasets/pltops/chitroJera.
h
Viet-Localization-VQA
huggingface.co
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fifth Civil Defender - 5CD (2025). Viet-Localization-VQA [Dataset]. https://huggingface.co/datasets/5CD-AI/Viet-Localization-VQA
Explore at:
Dataset updated
Jan 27, 2025
Dataset authored and provided by
Fifth Civil Defender - 5CD
Area covered
Việt Nam
Description
Dataset Overview

This dataset is was created from 56,989 Vietnamese 🇻🇳 localization images. The dataset includes quintessentially Vietnamese images such as scenic landscapes, historical sites, culinary specialties, festivals, cultural aspects from various regions, familiar rural scenes, and everyday life in urban areas, among others. Each image has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset. There is a… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-Localization-VQA.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

NeuLab @ LTI/CMU (2025). CulturalGround [Dataset]. https://huggingface.co/datasets/neulab/CulturalGround

CulturalGround

neulab/CulturalGround

Explore at:

23 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Aug 22, 2025

Dataset authored and provided by

NeuLab @ LTI/CMU

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

CulturalGround: Grounding Multilingual Multimodal LLMs With Cultural Knowledge

🌍 🇩🇪 🇫🇷 🇬🇧 🇪🇸 🇮🇹 🇵🇱 🇷🇺 🇨🇿 🇯🇵 🇺🇦 🇧🇷 🇮🇳 🇨🇳 🇳🇴 🇵🇹 🇮🇩 🇮🇱 🇹🇷 🇬🇷 🇷🇴 🇮🇷 🇹🇼 🇲🇽 🇮🇪 🇰🇷 🇧🇬 🇹🇭 🇳🇱 🇪🇬 🇵🇰 🇳🇬 🇮🇩 🇻🇳 🇲🇾 🇸🇦 🇮🇩 🇧🇩 🇸🇬 🇱🇰 🇰🇪 🇲🇳 🇪🇹 🇹🇿 🇷🇼 🏠 Homepage | 🤖 CulturalPangea-7B | 📊 CulturalGround | 💻 Github | 📄 Arxiv

We introduce CulturalGround, a large-scale cultural VQA dataset and a pipeline for creating cultural… See the full description on the dataset page: https://huggingface.co/datasets/neulab/CulturalGround.

Clear search

Close search

Google apps

Main menu

CulturalGround

TaiwanVQA

Data from: Amharic visual question answering on Ethiopian tourism

FoodieQA

French Visual Question-Answer Dataset

Introduction

Image Data

Question and Answer Pairs

chitroJera

Viet-Localization-VQA

CulturalGround

CulturalGround

neulab/CulturalGround