7 datasets found
  1. h

    mmlongbench-doc-results

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IXCLab@Shanghai AI Lab, mmlongbench-doc-results [Dataset]. https://huggingface.co/datasets/OpenIXCLab/mmlongbench-doc-results
    Explore at:
    Dataset authored and provided by
    IXCLab@Shanghai AI Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📊 MMLongBench-Doc Evaluation Results

    Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)

  2. Data from: Re-assembling the past: The RePAIR dataset and benchmark for real...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    txt, zip
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue (2024). Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving [Dataset]. http://doi.org/10.5281/zenodo.13993089
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue
    Description

    Accepted by NeurIPS 2024 Datasets and Benchmarks Track

    We introduce the RePair puzzle-solving dataset, a large-scale real world dataset of fractured frescoes from the archaelogical campus of Pompeii. Our dataset consists of over 1000 fractured frescoes. The RePAIR stands as a realistic computational challenge for methods for 2D and 3D puzzle solving, and serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. Please visit our website for more dataset information, access to source code scripts and for an interactive gallery viewing of the dataset samples.

    Access the entire dataset

    We provide a compressed version of our dataset in two seperate files. One for the 2D version and one for the 3D version.

    Our full dataset contains over one thousand individual fractured fragments divided into groups with its corresponding folder and all compressed into their individual sub-set format regarding whether they are 2D or 3D. Regarding the 2D dataset, each fragment is saved as a .PNG image and each group has the corresponding ground truth transformation to solve the puzzle as a .TXT file. Considering the 3D dataset, each fragment is saved as a mesh using the widely .OBJ format with the corresponding material (.MTL) and texture (.PNG) file. The meshes are already in the assembled position and orientation, so that no additional information is needed. All additional metadata information are given as .JSON files.

    Important Note

    Please be advised that downloading and reusing this dataset is permitted only upon acceptance of the following license terms.

    The Istituto Italiano di Tecnologia (IIT) declares, and the user (“User”) acknowledges, that the "RePAIR puzzle-solving dataset" contains 3D scans, texture maps, rendered images and meta-data of fresco fragments acquired at the Archaeological Site of Pompeii. IIT is authorised to publish the RePAIR puzzle-solving dataset herein only for scientific and cultural purposes and in connection with an academic publication referenced as Tsemelis et al., "Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving", NeurIPS 2024. Use of the RePAIR puzzle-solving dataset by User is limited to downloading, viewing such images; comparing these with data or content in other datasets. User is not authorised to use, in particular explicitly excluding any commercial use nor in conjunction with the promotion of a commercial enterprise and/or its product(s) or service(s), reproduce, copy, distribute the RePAIR puzzle-solving dataset. User will not use the RePAIR puzzle-solving dataset in any way prohibited by applicable laws. RePAIR puzzle-solving dataset therein is being provided to User without warranty of any kind, either expressed or implied. User will be solely responsible for their use of such RePAIR puzzle-solving dataset. In no event shall IIT be liable for any damages arising from such use.

  3. h

    Semi-Truths

    • huggingface.co
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semi Truths (2024). Semi-Truths [Dataset]. https://huggingface.co/datasets/semi-truths/Semi-Truths
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2024
    Authors
    Semi Truths
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Semi Truths Dataset: A Large-Scale Dataset for Testing Robustness of AI-Generated Image Detectors (NeurIPS 2024 Track Datasets & Benchmarks Track)

    Recent efforts have developed AI-generated image detectors claiming robustness against various augmentations, but their effectiveness remains unclear. Can these systems detect varying degrees of augmentation?

    To address these questions, we introduce Semi-Truths, featuring 27, 600 real images, 223, 400 masks, and 1, 472, 700… See the full description on the dataset page: https://huggingface.co/datasets/semi-truths/Semi-Truths.

  4. Chinese Harmful Meme Dataset

    • kaggle.com
    zip
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DUT-lujunyu (2024). Chinese Harmful Meme Dataset [Dataset]. https://www.kaggle.com/datasets/ljy201788027/chinese-harmful-meme-dataset-toxicn-mm
    Explore at:
    zip(1435530 bytes)Available download formats
    Dataset updated
    Nov 15, 2024
    Authors
    DUT-lujunyu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The paper has been accepted in NeurIPS 2024 (Dataset & Benchmark Track). paper repo

    ☠️ Warning: The samples presented by this paper may be considered offensive or vulgar.

    ❗️ Ethics Statement

    The opinions and findings contained in the samples of our presented dataset should not be interpreted as representing the views expressed or implied by the authors. We acknowledge the risk of malicious actors attempting to reverse-engineer memes. We sincerely hope that users will employ the dataset responsibly and appropriately, avoiding misuse or abuse. We believe the benefits of our proposed resources outweigh the associated risks. All resources are intended solely for scientific research and are prohibited from commercial use.

    📜 Chinese Harmful Meme

    To adapt to the Chinese online environment, we introduce the definition of Chinese harmful memes:

    Chinese harmful memes are multimodal units consisting of an image and Chinese inline text that have the potential to cause harm to an individual, an organization, a community, a social group, or society as a whole. These memes can range from offense or joking that perpetuate harmful stereotypes towards specific social entities, to memes that are more subtle and general but still have the potential to cause harm. It is important to note that Chinese harmful memes can be created and spread intentionally or unintentionally. They often reflect and reinforce underlying negative values and cultural attitudes on the Chinese Internet, which are detrimental from legal or moral perspectives.

    📜 ToxiCN MM

    According to the definition, we identify the most common harmful types of memes on Chinese platforms, including targeted harmful, general offense, sexual innuendo, and dispirited culture. We focus on these harmful types when constructing the dataset.

    During the annotation, we label memes from two aspects: harmful types (i.e., the above four types) and modality combination (i.e., analyzing toxicity through fused or independent features, including Text-Image Fusion, Harmful Text, and Harmful Image). Finally, we present the ToxiCN MM dataset, which contains 12,000 samples.

    Considering the potential risk of abuse, please fill out the following form to request the datasets: https://forms.gle/UN61ZNfTgMZKfMrv7. After we get your request, we will send the dataset to your email as soon as possible. The dataset labels and captions generated by GPT-4V have been saved as train_data_discription.json and test_data_discription.json in the ./data/ directory. Here we simply describe each fine-grain label.

    LabelDescription
    labelIdentify if a meme is Harmful (1) or Non-harmful (0).
    typeNon-harmful: 0, Targeted Harmful: 1, Sexual Innuendo: 2, General Offense: 3, Dispirited Culture: 4
    modalNon-harmful / Text-Image Fusion: [0, 0], Only Harmful Text: [1, 0], Only Harmful Image: [0, 1], Harmful Text & Image: [1, 1]

    📜 Detector

    We present a Multimodal Knowledge Enhancement Detector for effective detection. It incorporates contextual information of meme content to enhance the detector's understanding of Chinese memes generated by the LLM. The requirements.txt file lists the specific dependencies of the project.

    ❗️ Licenses

    This work is licensed under a Creative Commons Attribution- NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).

    Poster

    https://github.com/user-attachments/assets/c3cb7793-33f2-4e3e-ad72-e0d84530c658" alt="poster_original">

    Cite

    If you want to use the resources, please cite the following paper. The camera-ready version of the paper will be released after the conference: ~~~ @article{lu2024towards, title={Towards Comprehensive Detection of Chinese Harmful Memes}, author={Lu, Junyu and Xu, Bo and Zhang, Xiaokun and Wang, Hongbo and Zhu, Haohao and Zhang, Dongyu and Yang, Liang and Lin, Hongfei}, journal={arXiv preprint arXiv:2410.02378}, year={2024} } ~~~

  5. Z

    Data from: WikiDBs - A Large-Scale Corpus Of Relational Databases From...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten (2024). WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11559813
    Explore at:
    Dataset updated
    Dec 12, 2024
    Authors
    Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.

    WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.

    WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.

  6. h

    BLEnD

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nayeon Lee, BLEnD [Dataset]. https://huggingface.co/datasets/nayeon212/BLEnD
    Explore at:
    Authors
    Nayeon Lee
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    BLEnD

    This is the official repository of BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages (Submitted to NeurIPS 2024 Datasets and Benchmarks Track). 24/12/05: Updated translation errors25/05/02: Updated multiple choice questions file (v1.1)

      About
    

    Large language models (LLMs) often lack culture-specific everyday knowledge, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural… See the full description on the dataset page: https://huggingface.co/datasets/nayeon212/BLEnD.

  7. E

    BuckTales : A multi-UAV dataset for multi-object tracking and...

    • edmond.mpg.de
    mp4, zip
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar (2024). BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes [Dataset]. http://doi.org/10.17617/3.JCZ9WK
    Explore at:
    zip(65010277544), mp4(403189785), zip(3287471192), zip(457749126), mp4(130172114), zip(17011998466)Available download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    Edmond
    Authors
    Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The dataset contains UAV footage of wild antelopes (blackbucks) in grassland habitats. It can be mainly used for two tasks: Multi-object tracking (MOT) and Re-Identification (Re-ID). We provide annotations for the position of animals in each frame, allowing us to offer very long videos (up to 3 min) completely annotated while maintaining the identity of each animal in the video. The Re-ID dataset offers two videos, that capture the movement of some animals simultaneously from two different UAVs. The Re-ID task is to find the same individual in two videos taken simultaneously from a slightly different perspective. The relevant paper will be published in the NeurIPS 2024 Dataset and Benchmarking Track. https://nips.cc/virtual/2024/poster/97563 Resolution: 5.4 K MOT: 12 videos ( MOT17 Format) Re-ID: 6 sets (each with a pair of drones) (Custom) Detection: 320 Images (COCO, YOLO)

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
IXCLab@Shanghai AI Lab, mmlongbench-doc-results [Dataset]. https://huggingface.co/datasets/OpenIXCLab/mmlongbench-doc-results

mmlongbench-doc-results

OpenIXCLab/mmlongbench-doc-results

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
IXCLab@Shanghai AI Lab
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

📊 MMLongBench-Doc Evaluation Results

Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)

Search
Clear search
Close search
Google apps
Main menu