7 datasets found
  1. h

    CulturalGround

    • huggingface.co
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NeuLab @ LTI/CMU (2025). CulturalGround [Dataset]. https://huggingface.co/datasets/neulab/CulturalGround
    Explore at:
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    NeuLab @ LTI/CMU
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CulturalGround: Grounding Multilingual Multimodal LLMs With Cultural Knowledge

    🌍 🇩🇪 🇫🇷 🇬🇧 🇪🇸 🇮🇹 🇵🇱 🇷🇺 🇨🇿 🇯🇵 🇺🇦 🇧🇷 🇮🇳 🇨🇳 🇳🇴 🇵🇹 🇮🇩 🇮🇱 🇹🇷 🇬🇷 🇷🇴 🇮🇷 🇹🇼 🇲🇽 🇮🇪 🇰🇷 🇧🇬 🇹🇭 🇳🇱 🇪🇬 🇵🇰 🇳🇬 🇮🇩 🇻🇳 🇲🇾 🇸🇦 🇮🇩 🇧🇩 🇸🇬 🇱🇰 🇰🇪 🇲🇳 🇪🇹 🇹🇿 🇷🇼 🏠 Homepage | 🤖 CulturalPangea-7B | 📊 CulturalGround | 💻 Github | 📄 Arxiv

    We introduce CulturalGround, a large-scale cultural VQA dataset and a pipeline for creating cultural… See the full description on the dataset page: https://huggingface.co/datasets/neulab/CulturalGround.

  2. h

    TaiwanVQA

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hen-Hsen Huang (2025). TaiwanVQA [Dataset]. https://huggingface.co/datasets/hhhuang/TaiwanVQA
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    Hen-Hsen Huang
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    TaiwanVQA: Benchmarking and Enhancing Cultural Understanding in Vision-Language Models

      Dataset Summary
    

    TaiwanVQA is a visual question answering (VQA) benchmark designed to evaluate the capability of vision-language models (VLMs) in recognizing and reasoning about culturally specific content related to Taiwan. This dataset contains 2,736 images captured by our team, paired with 5,472 manually designed questions that cover diverse topics from daily life in Taiwan… See the full description on the dataset page: https://huggingface.co/datasets/hhhuang/TaiwanVQA.

  3. Data from: Amharic visual question answering on Ethiopian tourism

    • zenodo.org
    bin
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alazar alazar; alazar; alazar alazar; alazar (2024). Amharic visual question answering on Ethiopian tourism [Dataset]. http://doi.org/10.5281/zenodo.13941892
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    alazar alazar; alazar; alazar alazar; alazar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2024
    Area covered
    Ethiopia
    Description

    Visual Question Answering (VQA) is a Vision-to-Text (V2T) task that integrates visual
    features of images with natural language questions to generate meaningful responses.
    Most existing research has focused on English, leaving a significant gap for other
    languages, including Amharic. Tourism, a major global industry, relies heavily on
    interactions where visitors seek information about natural, historical, cultural, and
    religious sites. Ethiopia is a remarkable tourist destination, home to unique sites such as
    the Rock-hewn churches of Lalibela and the Castles of Gondar, as well as natural
    phenomena like Simien National Park and Lake Tana. Most visitors are local, creating an
    urgent need for a VQA model that can deliver accurate, culturally relevant information in
    Amharic. Unfortunately, no such model currently exists to assist tourists at these heritage
    sites. This research addresses this gap by developing an Amharic Visual Question
    Answering model specifically tailored for Ethiopian tourism. A new Amharic VQA
    dataset was created using 2,200 diverse images from Ethiopian tourist sites paired with
    6,600 questions in Amharic, covering natural landmarks, historical sites, and religious
    celebrations. Our dataset is collected from various sources, including the UNICCO
    website, the Amhara Tourism office, and online platforms such as Facebook, Free pixel,
    and Instagram. Each image is complemented by three corresponding questions
    formulated by three individual experts and answered by ten candidates. The questions,
    answers, and images are linked through annotations and fed into the model. We used
    ResNet-50 for feature extraction and Bidirectional Gated Recurrent Unit (BiGRU) with
    attention mechanisms, achieving a testing accuracy of 54.98%, demonstrating the model's
    effectiveness in answering questions about Ethiopian heritage. We will expand this
    research using external knowledge to gat answer and description beyond image and
    custom object detection

  4. h

    FoodieQA

    • huggingface.co
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wenyan li (2024). FoodieQA [Dataset]. https://huggingface.co/datasets/lyan62/FoodieQA
    Explore at:
    Dataset updated
    Oct 2, 2024
    Authors
    wenyan li
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

    FoodieQA is a benchmark including Multi-image, Single-image VQA, and textQA questions about regional Chinese food. Built upon 389 unique food images on 350 unique food entries. The food images are collected from individual volunteers and not from the web to ensure evaluation fairness, specifically designed to evaluate the VLMs' capability on fine-grained understanding of Chinese food culture.… See the full description on the dataset page: https://huggingface.co/datasets/lyan62/FoodieQA.

  5. F

    French Visual Question-Answer Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Visual Question-Answer Dataset [Dataset]. http://doi.org/10.5281/zenodo.5083737
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the French Language Visual Question Answer Dataset. The dataset includes 5000 diverse images and total 35000+ question-answer pairs associated with it. This meticulously curated dataset advances AI models for multimodal data understanding and development of French language visual question-answering (VQA) models.

    Image Data

    This image question-answer training dataset comprises over 5,000 high-resolution images across diverse categories and scenes. Each image is carefully selected to represent a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust VQA models.

    Image Data Information: The images in this dataset were collected through a rigorous and ethical process.
    Clarity: Each image was checked for visual clarity and appropriateness.
    Relevance: Images were selected based on their relevance to language dominance region and potential VQA scenarios, ensuring they depict a wide range of real-world contexts and objects.
    Copyright-Free: The images in the dataset are free from any copyright issues.
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    Type: Dataset contains images that have graphical as well as textual content in it.
    Categories and Topics: The dataset spans a wide range of categories and topics to ensure thorough training, fine-tuning, and testing of VQA models. Topics include:
    Daily Life: Images about household objects, activities, and daily routines.
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    Human Activities: Images about people, their actions, professions, and interactions.
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    Food and Dining: Images about different foods, meals, and dining settings.
    Education: Images related to educational settings, materials, and activities.
    Sports and Recreation: Images about various sports, games, and recreational activities.
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Question and Answer Pairs

    The dataset includes more than 35,000 French-language question and answer pairs, which means around 7-10 question answers for each image. It is thoughtfully crafted to cover various levels of complexity and types of questions. These pairs are designed to test and improve the model's ability to understand and respond to visual inputs in natural language.

    Types of Questions: The dataset includes a diverse set of question types to ensure comprehensive model training:
    Descriptive Questions: These questions seek detailed descriptions of objects, people, or scenes within the image.
    Counting Questions: These questions involve counting the number of specific objects or elements present in the image.
    Yes/No Questions: These questions require a binary yes or no answer based on the visual content.
    Location-Based Questions: These questions focus on identifying the location of objects or elements within the image.
    <div style="display: flex; gap: 16px; align-items:

  6. h

    chitroJera

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moshi, chitroJera [Dataset]. https://huggingface.co/datasets/pltops/chitroJera
    Explore at:
    Authors
    Moshi
    Description

    ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla

    ChitroJera is the first large-scale Visual Question Answering (VQA) dataset for Bangla, designed to capture regionally relevant cultural, linguistic, and visual contexts. It enables research on multimodal learning in low-resource languages and encourages the development of AI systems tailored to South Asian contexts.

      📑 Paper
    

    If you use ChitroJera in your research, please cite:… See the full description on the dataset page: https://huggingface.co/datasets/pltops/chitroJera.

  7. h

    Viet-Localization-VQA

    • huggingface.co
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2025). Viet-Localization-VQA [Dataset]. https://huggingface.co/datasets/5CD-AI/Viet-Localization-VQA
    Explore at:
    Dataset updated
    Jan 27, 2025
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Area covered
    Việt Nam
    Description

    Dataset Overview

    This dataset is was created from 56,989 Vietnamese 🇻🇳 localization images. The dataset includes quintessentially Vietnamese images such as scenic landscapes, historical sites, culinary specialties, festivals, cultural aspects from various regions, familiar rural scenes, and everyday life in urban areas, among others. Each image has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset. There is a… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-Localization-VQA.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NeuLab @ LTI/CMU (2025). CulturalGround [Dataset]. https://huggingface.co/datasets/neulab/CulturalGround

CulturalGround

CulturalGround

neulab/CulturalGround

Explore at:
23 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Aug 22, 2025
Dataset authored and provided by
NeuLab @ LTI/CMU
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

CulturalGround: Grounding Multilingual Multimodal LLMs With Cultural Knowledge

🌍 🇩🇪 🇫🇷 🇬🇧 🇪🇸 🇮🇹 🇵🇱 🇷🇺 🇨🇿 🇯🇵 🇺🇦 🇧🇷 🇮🇳 🇨🇳 🇳🇴 🇵🇹 🇮🇩 🇮🇱 🇹🇷 🇬🇷 🇷🇴 🇮🇷 🇹🇼 🇲🇽 🇮🇪 🇰🇷 🇧🇬 🇹🇭 🇳🇱 🇪🇬 🇵🇰 🇳🇬 🇮🇩 🇻🇳 🇲🇾 🇸🇦 🇮🇩 🇧🇩 🇸🇬 🇱🇰 🇰🇪 🇲🇳 🇪🇹 🇹🇿 🇷🇼 🏠 Homepage | 🤖 CulturalPangea-7B | 📊 CulturalGround | 💻 Github | 📄 Arxiv

We introduce CulturalGround, a large-scale cultural VQA dataset and a pipeline for creating cultural… See the full description on the dataset page: https://huggingface.co/datasets/neulab/CulturalGround.

Search
Clear search
Close search
Google apps
Main menu