7 datasets found

h
mmlongbench-doc-results
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IXCLab@Shanghai AI Lab, mmlongbench-doc-results [Dataset]. https://huggingface.co/datasets/OpenIXCLab/mmlongbench-doc-results
Explore at:
Dataset authored and provided by
IXCLab@Shanghai AI Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📊 MMLongBench-Doc Evaluation Results

Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)
Data from: Re-assembling the past: The RePAIR dataset and benchmark for real...
zenodo.org
data.niaid.nih.gov
+1more
txt, zip
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue (2024). Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving [Dataset]. http://doi.org/10.5281/zenodo.13993089
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13993089
Dataset updated
Nov 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Theodore Tsesmelis; Theodore Tsesmelis; Luca Palmieri; Luca Palmieri; Marina Khoroshiltseva; Marina Khoroshiltseva; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue; Adeela Islam; Gur Elkin; Ofir Shahar Itzhak; Gianluca Scarpellini; Stefano Fiorini; Yaniv Ohayon; Nadav Alali; Sinem Aslan; Pietro Morerio; Sebastiano Vascon; Elena Gravina; Maria Christina Napolitano; Giuseppe Scarpati; Gabriel Zuchtriegel; Alexandra Spühler; Michel E. Fuchs; Stuart James; Ohad Ben-Shahar; Marcello Pelillo; Alessio Del Bue
Description
Accepted by NeurIPS 2024 Datasets and Benchmarks Track

We introduce the RePair puzzle-solving dataset, a large-scale real world dataset of fractured frescoes from the archaelogical campus of Pompeii. Our dataset consists of over 1000 fractured frescoes. The RePAIR stands as a realistic computational challenge for methods for 2D and 3D puzzle solving, and serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. Please visit our website for more dataset information, access to source code scripts and for an interactive gallery viewing of the dataset samples.

Access the entire dataset

We provide a compressed version of our dataset in two seperate files. One for the 2D version and one for the 3D version.

Our full dataset contains over one thousand individual fractured fragments divided into groups with its corresponding folder and all compressed into their individual sub-set format regarding whether they are 2D or 3D. Regarding the 2D dataset, each fragment is saved as a .PNG image and each group has the corresponding ground truth transformation to solve the puzzle as a .TXT file. Considering the 3D dataset, each fragment is saved as a mesh using the widely .OBJ format with the corresponding material (.MTL) and texture (.PNG) file. The meshes are already in the assembled position and orientation, so that no additional information is needed. All additional metadata information are given as .JSON files.

Important Note

Please be advised that downloading and reusing this dataset is permitted only upon acceptance of the following license terms.

The Istituto Italiano di Tecnologia (IIT) declares, and the user (“User”) acknowledges, that the "RePAIR puzzle-solving dataset" contains 3D scans, texture maps, rendered images and meta-data of fresco fragments acquired at the Archaeological Site of Pompeii. IIT is authorised to publish the RePAIR puzzle-solving dataset herein only for scientific and cultural purposes and in connection with an academic publication referenced as Tsemelis et al., "Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving", NeurIPS 2024. Use of the RePAIR puzzle-solving dataset by User is limited to downloading, viewing such images; comparing these with data or content in other datasets. User is not authorised to use, in particular explicitly excluding any commercial use nor in conjunction with the promotion of a commercial enterprise and/or its product(s) or service(s), reproduce, copy, distribute the RePAIR puzzle-solving dataset. User will not use the RePAIR puzzle-solving dataset in any way prohibited by applicable laws. RePAIR puzzle-solving dataset therein is being provided to User without warranty of any kind, either expressed or implied. User will be solely responsible for their use of such RePAIR puzzle-solving dataset. In no event shall IIT be liable for any damages arising from such use.
h
Semi-Truths
huggingface.co
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semi Truths (2024). Semi-Truths [Dataset]. https://huggingface.co/datasets/semi-truths/Semi-Truths
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2024
Authors
Semi Truths
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
Semi Truths Dataset: A Large-Scale Dataset for Testing Robustness of AI-Generated Image Detectors (NeurIPS 2024 Track Datasets & Benchmarks Track)

Recent efforts have developed AI-generated image detectors claiming robustness against various augmentations, but their effectiveness remains unclear. Can these systems detect varying degrees of augmentation?

To address these questions, we introduce Semi-Truths, featuring 27, 600 real images, 223, 400 masks, and 1, 472, 700… See the full description on the dataset page: https://huggingface.co/datasets/semi-truths/Semi-Truths.

Chinese Harmful Meme Dataset

kaggle.com

zip

Updated Nov 15, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

DUT-lujunyu (2024). Chinese Harmful Meme Dataset [Dataset]. https://www.kaggle.com/datasets/ljy201788027/chinese-harmful-meme-dataset-toxicn-mm

Explore at:

zip(1435530 bytes)Available download formats

Dataset updated

Nov 15, 2024

Authors

DUT-lujunyu

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The paper has been accepted in NeurIPS 2024 (Dataset & Benchmark Track). paper repo

☠️ Warning: The samples presented by this paper may be considered offensive or vulgar.

❗️ Ethics Statement

The opinions and findings contained in the samples of our presented dataset should not be interpreted as representing the views expressed or implied by the authors. We acknowledge the risk of malicious actors attempting to reverse-engineer memes. We sincerely hope that users will employ the dataset responsibly and appropriately, avoiding misuse or abuse. We believe the benefits of our proposed resources outweigh the associated risks. All resources are intended solely for scientific research and are prohibited from commercial use.

📜 Chinese Harmful Meme

To adapt to the Chinese online environment, we introduce the definition of Chinese harmful memes:

Chinese harmful memes are multimodal units consisting of an image and Chinese inline text that have the potential to cause harm to an individual, an organization, a community, a social group, or society as a whole. These memes can range from offense or joking that perpetuate harmful stereotypes towards specific social entities, to memes that are more subtle and general but still have the potential to cause harm. It is important to note that Chinese harmful memes can be created and spread intentionally or unintentionally. They often reflect and reinforce underlying negative values and cultural attitudes on the Chinese Internet, which are detrimental from legal or moral perspectives.

📜 ToxiCN MM

According to the definition, we identify the most common harmful types of memes on Chinese platforms, including targeted harmful, general offense, sexual innuendo, and dispirited culture. We focus on these harmful types when constructing the dataset.

During the annotation, we label memes from two aspects: harmful types (i.e., the above four types) and modality combination (i.e., analyzing toxicity through fused or independent features, including Text-Image Fusion, Harmful Text, and Harmful Image). Finally, we present the ToxiCN MM dataset, which contains 12,000 samples.

Considering the potential risk of abuse, please fill out the following form to request the datasets: https://forms.gle/UN61ZNfTgMZKfMrv7. After we get your request, we will send the dataset to your email as soon as possible. The dataset labels and captions generated by GPT-4V have been saved as train_data_discription.json and test_data_discription.json in the ./data/ directory. Here we simply describe each fine-grain label.

Label	Description
label	Identify if a meme is Harmful (1) or Non-harmful (0).
type	Non-harmful: 0, Targeted Harmful: 1, Sexual Innuendo: 2, General Offense: 3, Dispirited Culture: 4
modal	Non-harmful / Text-Image Fusion: [0, 0], Only Harmful Text: [1, 0], Only Harmful Image: [0, 1], Harmful Text & Image: [1, 1]

📜 Detector

We present a Multimodal Knowledge Enhancement Detector for effective detection. It incorporates contextual information of meme content to enhance the detector's understanding of Chinese memes generated by the LLM. The requirements.txt file lists the specific dependencies of the project.

❗️ Licenses

This work is licensed under a Creative Commons Attribution- NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).

Poster

https://github.com/user-attachments/assets/c3cb7793-33f2-4e3e-ad72-e0d84530c658" alt="poster_original">

Cite

If you want to use the resources, please cite the following paper. The camera-ready version of the paper will be released after the conference: ~~~ @article{lu2024towards, title={Towards Comprehensive Detection of Chinese Harmful Memes}, author={Lu, Junyu and Xu, Bo and Zhang, Xiaokun and Wang, Hongbo and Zhu, Haohao and Zhang, Dongyu and Yang, Liang and Lin, Hongfei}, journal={arXiv preprint arXiv:2410.02378}, year={2024} } ~~~

Z
Data from: WikiDBs - A Large-Scale Corpus Of Relational Databases From...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten (2024). WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11559813
Explore at:
Dataset updated
Dec 12, 2024
Authors
Vogel, Liane; Bodensohn, Jan-Micha; Binnig, Carsten
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.

WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.

WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.
h
BLEnD
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nayeon Lee, BLEnD [Dataset]. https://huggingface.co/datasets/nayeon212/BLEnD
Explore at:
Authors
Nayeon Lee
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
BLEnD

This is the official repository of BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages (Submitted to NeurIPS 2024 Datasets and Benchmarks Track). 24/12/05: Updated translation errors25/05/02: Updated multiple choice questions file (v1.1)

About

Large language models (LLMs) often lack culture-specific everyday knowledge, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural… See the full description on the dataset page: https://huggingface.co/datasets/nayeon212/BLEnD.
E
BuckTales : A multi-UAV dataset for multi-object tracking and...
edmond.mpg.de
mp4, zip
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar (2024). BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes [Dataset]. http://doi.org/10.17617/3.JCZ9WK
Explore at:
zip(65010277544), mp4(403189785), zip(3287471192), zip(457749126), mp4(130172114), zip(17011998466)Available download formats
Unique identifier
https://doi.org/10.17617/3.JCZ9WK
Dataset updated
Dec 19, 2024
Dataset provided by
Edmond
Authors
Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar; Hemal naik; Junran Yang; Dipin Das; Margaret Crofoot; Akanksha Rathore; Vivek Hari Sridhar
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The dataset contains UAV footage of wild antelopes (blackbucks) in grassland habitats. It can be mainly used for two tasks: Multi-object tracking (MOT) and Re-Identification (Re-ID). We provide annotations for the position of animals in each frame, allowing us to offer very long videos (up to 3 min) completely annotated while maintaining the identity of each animal in the video. The Re-ID dataset offers two videos, that capture the movement of some animals simultaneously from two different UAVs. The Re-ID task is to find the same individual in two videos taken simultaneously from a slightly different perspective. The relevant paper will be published in the NeurIPS 2024 Dataset and Benchmarking Track. https://nips.cc/virtual/2024/poster/97563 Resolution: 5.4 K MOT: 12 videos ( MOT17 Format) Re-ID: 6 sets (each with a pair of drones) (Custom) Detection: 320 Images (COCO, YOLO)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

IXCLab@Shanghai AI Lab, mmlongbench-doc-results [Dataset]. https://huggingface.co/datasets/OpenIXCLab/mmlongbench-doc-results

mmlongbench-doc-results

OpenIXCLab/mmlongbench-doc-results

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset authored and provided by

IXCLab@Shanghai AI Lab

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

📊 MMLongBench-Doc Evaluation Results

Official evaluation results: GPT-4.1 (2025-04-14) & GPT-4o (2024-11-20) 📄 Paper: MMLongBench-Doc, NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)

Clear search

Close search

Google apps

Main menu

mmlongbench-doc-results

Data from: Re-assembling the past: The RePAIR dataset and benchmark for real...

Access the entire dataset

Important Note

Semi-Truths

Chinese Harmful Meme Dataset

❗️ Ethics Statement

📜 Chinese Harmful Meme

📜 ToxiCN MM

📜 Detector

❗️ Licenses

Poster

Cite

Data from: WikiDBs - A Large-Scale Corpus Of Relational Databases From...

BLEnD

BuckTales : A multi-UAV dataset for multi-object tracking and...

mmlongbench-doc-results

OpenIXCLab/mmlongbench-doc-results