100+ datasets found
  1. O

    COCO 2017

    • opendatalab.com
    • huggingface.co
    zip
    Updated Sep 30, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2017). COCO 2017 [Dataset]. https://opendatalab.com/OpenDataLab/COCO_2017
    Explore at:
    zip(49105147630 bytes)Available download formats
    Dataset updated
    Sep 30, 2017
    Dataset provided by
    Microsoft
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image 250,000 people with keypoints

  2. h

    OHR-Bench

    • huggingface.co
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataLab (2025). OHR-Bench [Dataset]. https://huggingface.co/datasets/opendatalab/OHR-Bench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2025
    Dataset authored and provided by
    OpenDataLab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

    [📜 arXiv] | [Dataset (🤗Hugging Face)] | [Dataset (OpenDataLab)]

    This repository contains the official code of OHR-Bench, a benchmark designed to evaluate the cascading impact of OCR on RAG.

      Overview
    

    PDF, gt structured data and Q&A datasets: [🤗 Hugging Face] pdfs.zip, data/retrieval_base/gt. It includes 8500+ unstructured PDF pages from various domains, including Textbook, Law… See the full description on the dataset page: https://huggingface.co/datasets/opendatalab/OHR-Bench.

  3. O

    TAO AVA and HACS videos

    • opendatalab.com
    zip
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut national de recherche en informatique et en automatique (2023). TAO AVA and HACS videos [Dataset]. https://opendatalab.com/OpenDataLab/TAO_AVA_and_HACS_videos
    Explore at:
    zip(242338745891 bytes)Available download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Institut national de recherche en informatique et en automatique
    Carnegie Mellon University
    Toyota Research Institute
    Argo AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TAO is a federated dataset for Tracking Any Object, containing 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average. We adopt a bottom-up approach for discovering a large vocabulary of 833 categories, an order of magnitude more than prior tracking benchmarks.

  4. h

    ProverQA

    • huggingface.co
    • opendatalab.com
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataLab (2025). ProverQA [Dataset]. https://huggingface.co/datasets/opendatalab/ProverQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    OpenDataLab
    Description

    This dataset is for evaluating logical reasoning with large language models, as described in the paper Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation. Code: https://github.com/opendatalab/ProverGen

  5. O

    蜜巢·花粉1.0

    • opendatalab.com
    zip
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Midu (2023). 蜜巢·花粉1.0 [Dataset]. https://opendatalab.com/OpenDataLab/MiChao
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    Midu
    Corpus Data Alliance for Foudation Model
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    蜜巢·花粉1.0开源数据集为文本数据集。数据集由互联网公开可访问网站2022年历史数据收集整理而成,数据总量7000余万条。数据集具备来源可靠,数据质量高,可持续稳定更新等特点。蜜巢·花粉数据集已被应用于多个大模型的训练,为媒体垂直领域提供基于材料的知识问答与内容生成、分析报告自动生成、文稿内容审校与润色改写等各类智能生成式服务。

  6. h

    WanJuanSiLu-Multimodal-5Languages

    • huggingface.co
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataLab (2025). WanJuanSiLu-Multimodal-5Languages [Dataset]. https://huggingface.co/datasets/opendatalab/WanJuanSiLu-Multimodal-5Languages
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    OpenDataLab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WanJuan·SiLu Multimodal Multilingual Corpus

      🌏Dataset Introduction
    

    The newly upgraded "Wanjuan·Silk Road Multimodal Corpus" brings the following three core improvements:

    The number of languages has been significantly expanded: Based on the five open-source languages ​​of "Wanjuan·Silk Road", namely Arabic, Russian, Korean, Vietnamese, and Thai, "Wanjuan·Silk Road Multimodal" has added three scarce corpus data of Serbian, Hungarian, and Czech, and uses the above eight key… See the full description on the dataset page: https://huggingface.co/datasets/opendatalab/WanJuanSiLu-Multimodal-5Languages.

  7. O

    WanJuan1.0(书生·万卷)

    • opendatalab.com
    zip
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Corpus Data Alliance for Foudation Model (2023). WanJuan1.0(书生·万卷) [Dataset]. https://opendatalab.com/OpenDataLab/WanJuan1_dot_0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 14, 2023
    Dataset provided by
    Shanghai Artificial Intelligence Laboratory
    Corpus Data Alliance for Foudation Model
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Intern · Wanjuan 1.0 is the first open source version of the Intern · Wanjuan multimodal corpus, which includes three parts: NLP dataset, muti-modal dataset, and video dataset, with a total data volume of over 2TB.

    At present, Intern · Wanjuan 1.0 has been applied to the training of InternMM and InternLM. By digesting high-quality corpus, the Intern Series model exhibits excellent performance in various generative tasks such as semantic understanding, knowledge Q&A, visual understanding, and visual Q&A.

    (Email contact: OpenDataLab@pjlab.org.cn)

  8. PandaLM-testset

    • opendatalab.com
    zip
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University (2023). PandaLM-testset [Dataset]. https://opendatalab.com/OpenDataLab/PandaLM-testset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    Microsoft Research Asiahttps://www.msra.cn/
    Westlake University
    Peking University
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    PandaLM aims to provide reproducible and automated comparisons between different large language models (LLMs). By giving PandaLM the same context, it can compare the responses of different LLMs and provide a reason for the decision, along with a reference answer. The target audience for PandaLM may be organizations that have confidential data and research labs with limited funds that seek reproducibility. These organizations may not want to disclose their data to third parties or may not be able to afford the high costs of secret data leakage using third-party APIs or hiring human annotators. With PandaLM, they can perform evaluations without compromising data security or incurring high costs, and obtain reproducible results. To demonstrate the reliability and consistency of our tool, we have created a diverse human-annotated test dataset of approximately 1,000 samples, where the contexts and the labels are all created by humans. Our results indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluation ability and 88.28% of GPT-4's in terms of F1-score on our test dataset.. More papers and features are coming soon.

  9. O

    massive

    • opendatalab.com
    • paperswithcode.com
    • +2more
    zip
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon (2022). massive [Dataset]. https://opendatalab.com/OpenDataLab/massive
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 20, 2022
    Dataset provided by
    Amazon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MASSIVE 1.1 is a parallel dataset of > 1M utterances across 52 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.

  10. O

    WanJuan2.0 (万卷-CC)

    • opendatalab.com
    zip
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shanghai Artificial Intelligence Laboratory (2024). WanJuan2.0 (万卷-CC) [Dataset]. https://opendatalab.com/OpenDataLab/WanJuanCC
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Shanghai Artificial Intelligence Laboratory
    Corpus Data Alliance for Foudation Model
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WanJuan2.0(万卷-CC) 是从CommonCrawl获取的一个 1T Tokens 的高质量英文网络文本数据集。结果显示,与各类开源英文CC语料在 Perspective API 不同维度的评估上,WanJuan-CC都表现出更高的安全性。此外,通过在4个验证集上的困惑度(PPL)和6下游任务的准确率,也展示了WanJuan-CC的实用性。WanJuan-CC在各种验证集上的PPL表现出竞争力,特别是在要求更高语言流畅性的tiny-storys等集上。通过与同类型数据集进行1B模型训练对比,使用验证数据集的困惑度(perplexity)和下游任务的准确率作为评估指标,实验证明,WanJuan-CC显著提升了英文文本补全和通用英文能力任务的性能。

  11. O

    openai-humaneval

    • opendatalab.com
    zip
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenAI (2023). openai-humaneval [Dataset]. https://opendatalab.com/OpenDataLab/openai-humaneval
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    OpenAIhttps://openai.com/
    Zipline
    Anthropic AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.

  12. O

    CLICK-ID

    • opendatalab.com
    zip
    Updated Aug 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Universitas Gadjah Mada (2020). CLICK-ID [Dataset]. https://opendatalab.com/OpenDataLab/CLICK-ID
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2020
    Dataset provided by
    Universitas Gadjah Mada
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CLICK-ID dataset is a collection of Indonesian news headlines that was collected from 12 local online news publishers; detikNews, Fimela, Kapanlagi, Kompas, Liputan6, Okezone, Posmetro-Medan, Republika, Sindonews, Tempo, Tribunnews, and Wowkeren. This dataset is comprised of mainly two parts; (i) 46,119 raw article data, and (ii) 15,000 clickbait annotated sample headlines. Annotation was conducted with 3 annotator examining each headline. Judgment were based only on the headline. The majority then is considered as the ground truth. In the annotated sample, our annotation shows 6,290 clickbait and 8,710 non-clickbait.

  13. O

    JGLUE

    • opendatalab.com
    • huggingface.co
    zip
    Updated Jan 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waseda University (2024). JGLUE [Dataset]. https://opendatalab.com/OpenDataLab/JGLUE
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2024
    Dataset provided by
    Waseda University
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    JGLUE, Japanese General Language Understanding Evaluation, is built to measure the general NLU ability in Japanese. JGLUE has been constructed from scratch without translation. We hope that JGLUE will facilitate NLU research in Japanese. JGLUE has been constructed by a joint research project of Yahoo Japan Corporation and Kawahara Lab at Waseda University.

  14. O

    databricks-dolly-15k-ja-reformat-v1

    • opendatalab.com
    • huggingface.co
    zip
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). databricks-dolly-15k-ja-reformat-v1 [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-ja-reformat-v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 13, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    databricks-dolly-15k is a corpus of more than 15,000 records generated by thousands of Databricks employees to enable large language models to exhibit the magical interactivity of ChatGPT. Databricks employees were invited to create prompt / response pairs in each of eight different instruction categories, including the seven outlined in the InstructGPT paper, as well as an open-ended free-form category. The contributors were instructed to avoid using information from any source on the web with the exception of Wikipedia (for particular subsets of instruction categories), and explicitly instructed to avoid using generative AI in formulating instructions or responses. Examples of each behavior were provided to motivate the types of questions and instructions appropriate to each category. Halfway through the data generation process, contributors were given the option of answering questions posed by other contributors. They were asked to rephrase the original question and only select questions they could be reasonably expected to answer correctly. For certain categories contributors were asked to provide reference texts copied from Wikipedia. Reference text (indicated by the context field in the actual dataset) may contain bracketed Wikipedia citation numbers (e.g. [42]) which we recommend users remove for downstream applications.

  15. O

    Data from: Diffusion Policy

    • opendatalab.com
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Columbia University (2023). Diffusion Policy [Dataset]. https://opendatalab.com/OpenDataLab/Diffusion%20Policy
    Explore at:
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Columbia University
    Massachusetts Institute of Technology
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models.

  16. GLUE

    • opendatalab.com
    zip
    Updated Nov 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York University (2018). GLUE [Dataset]. https://opendatalab.com/OpenDataLab/glue
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 1, 2018
    Dataset provided by
    Paul G. Allen School of Computer Science and Engineering
    New York University
    DeepMind
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems. GLUE consists of: A benchmark of nine sentence- or sentence-pair language understanding tasks built on established existing datasets and selected to cover a diverse range of dataset sizes, text genres, and degrees of difficulty, A diagnostic dataset designed to evaluate and analyze model performance with respect to a wide range of linguistic phenomena found in natural language, and A public leaderboard for tracking performance on the benchmark and a dashboard for visualizing the performance of models on the diagnostic set. The format of the GLUE benchmark is model-agnostic, so any system capable of processing sentence and sentence pairs and producing corresponding predictions is eligible to participate. The benchmark tasks are selected so as to favor models that share information across tasks using parameter sharing or other transfer learning techniques. The ultimate goal of GLUE is to drive research in the development of general and robust natural language understanding systems.

  17. O

    miam

    • opendatalab.com
    zip
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut polytechnique de Paris (2024). miam [Dataset]. https://opendatalab.com/OpenDataLab/miam
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 15, 2024
    Dataset provided by
    IBM GBS France
    Institut polytechnique de Paris
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Multilingual dIalogAct benchMark is a collection of resources for training, evaluating, and analyzing natural language understanding systems specifically designed for spoken language. Datasets are in English, French, German, Italian and Spanish. They cover a variety of domains including spontaneous speech, scripted scenarios, and joint task completion. All datasets contain dialogue act labels.

  18. O

    sentiment140

    • opendatalab.com
    • tensorflow.org
    • +2more
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University (2023). sentiment140 [Dataset]. https://opendatalab.com/OpenDataLab/sentiment140
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 19, 2023
    Dataset provided by
    Stanford University
    Description

    Sentiment140 consists of Twitter messages with emoticons, which are used as noisy labels for sentiment classification. For more detailed information please refer to the paper.

  19. O

    xcsr

    • opendatalab.com
    zip
    Updated Jan 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Southern California (2021). xcsr [Dataset]. https://opendatalab.com/OpenDataLab/xcsr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2021
    Dataset provided by
    University of Southern California
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    To evaluate multi-lingual language models (ML-LMs) for commonsense reasoning in a cross-lingual zero-shot transfer setting (X-CSR), i.e., training in English and test in other languages, we create two benchmark datasets, namely X-CSQA and X-CODAH. Specifically, we automatically translate the original CSQA and CODAH datasets, which only have English versions, to 15 other languages, forming development and test sets for studying X-CSR. As our goal is to evaluate different ML-LMs in a unified evaluation protocol for X-CSR, we argue that such translated examples, although might contain noise, can serve as a starting benchmark for us to obtain meaningful analysis, before more human-translated datasets will be available in the future.

  20. O

    RICH

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Apr 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Middlebury College (2022). RICH [Dataset]. https://opendatalab.com/OpenDataLab/RICH
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 2, 2022
    Dataset provided by
    Max Planck Institute for Intelligent Systems
    Middlebury College
    Description

    Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives. We capture a new dataset called RICH for “Real scenes, Interaction, Contact and Humans.” RICH contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, and high resolution 3D scene scans. A key feature of RICH is that it also contains accurate vertex-level contact labels on the body. Using RICH, we train a network that predicts dense body-scene contacts from a single RGB image. Our key insight is that regions in contact are always occluded so the network needs the ability to explore the whole image for evidence. We use a transformer to learn such non-local relationships and propose a new Body-Scene contact TRansfOrmer (BSTRO). Very few methods explore 3D contact; those that do focus on the feet only, detect foot contact as a post-processing step, or infer contact from body pose without looking at the scene. To our knowledge, BSTRO is the first method to directly estimate 3D body-scene contact from a single image. We demonstrate that BSTRO significantly outperforms the prior art.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Microsoft (2017). COCO 2017 [Dataset]. https://opendatalab.com/OpenDataLab/COCO_2017

COCO 2017

OpenDataLab/COCO_2017

Explore at:
zip(49105147630 bytes)Available download formats
Dataset updated
Sep 30, 2017
Dataset provided by
Microsoft
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image 250,000 people with keypoints

Search
Clear search
Close search
Google apps
Main menu