Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted on GitHub. Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset is also de-duplicated using the CD4Py tool. The list of duplicate files is provided in duplicate_files.txt file. All of its Python projects are processed in JSON-formatted files. They contain a seq2seq representation of each file, type-related hints, and information for machine learning models. The structure of JSON-formatted files is described in JSONOutput.md file. The dataset is split into train, validation and test sets by source code files. The list of files and their corresponding set is provided in dataset_split.csv file. Notable changes to each version of the dataset are documented in CHANGELOG.md.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SI-NLI (Slovene Natural Language Inference Dataset) contains 5,937 human-created Slovene sentence pairs (premise and hypothesis) that are manually labeled with the labels "entailment", "contradiction", and "neutral". We created the dataset using sentences that appear in the Slovenian reference corpus ccKres (http://hdl.handle.net/11356/1034). Annotators were tasked to modify the hypothesis in a candidate pair in a way that reflects one of the labels. The dataset is balanced since the annotators created three modifications (entailment, contradiction, neutral) for each candidate sentence pair. The dataset is split into train, validation, and test sets, with sizes of 4,392, 547, and 998. We used Slovenian pre-trained language models to create splits, thereby ensuring that difficult and easy instances are evenly distributed in all three subsets.
The dataset is released in a tabular TSV format. The README.txt file contains a description of the attributes. Only the hypothesis and premise are given in the test set (i.e. no annotations) since SI-NLI is integrated into the Slovene evaluation framework SloBENCH (https://slobench.cjvt.si/). If you use the dataset to train your models, please consider submitting the test set predictions to SloBENCH to get the evaluation score and see how it compares to others.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By xnli (From Huggingface) [source]
The xnli Multilingual Natural Language Inference Dataset is a comprehensive collection of data specifically curated for training and evaluating natural language inference (NLI) models in various languages. It provides a diverse range of language splits, each containing examples in different languages such as Arabic, Bulgarian, Chinese, German, English, Greek, Spanish, French, Hindi, Indonesian, Italian, Japanese and many others.
With the goal of facilitating NLI tasks across multiple languages, this dataset includes separate CSV files for each language split. The available splits cover an extensive range of languages including widely spoken ones like English and Spanish as well as less commonly used ones like Urdu and Vietnamese.
Each CSV file consists of labeled examples that are essential for training and assessing the performance of NLI models. These examples contain two main components: the premise and the hypothesis. The premise represents the initial sentence or text segment that forms the foundation for the NLI task. On the other hand,**the hypothesis serves as the second sentence or text segment. Its comparison to** the premise determines the logical relationship between them.
One crucial aspect contributing to effective analysis is the label assigned to each example indicating its logical relationship with respect to entailment or contradiction against their respective premises. These labels fall into three categories: entailment (where it can be inferred from** **the premise), contradiction (when it contradicts the premise), or neutral (when there exists no logical relationship between them).
Moreover,** to support development across different linguistic domains, this dataset also includes specific test splits dedicated to evaluating NLI models in individual languages such as English (en_test.csv), Urdu (ur_test.csv), among others.
Researchers and practitioners engaged in building multilingual NLI models can utilize this xnli dataset encompassing numerous language variations along with suitable labeled examples to train their models effectively and assess their performance accurately in terms of understanding logical relationships between sentences within multiple linguistic contexts
- Cross-lingual NLI Modeling: The xnli dataset provides an opportunity to train and test natural language inference models across multiple languages. Researchers can use this dataset to develop cross-lingual NLI models that can effectively understand the logical relationship between premises and hypotheses in different languages.
- Language Transfer Learning: By training on the xnli dataset, language models can learn to transfer their knowledge across different languages. This dataset can be used for pre-training models in one language and fine-tuning them for downstream tasks in another language, improving the performance of natural language understanding models in low-resource languages.
- Multilingual Evaluation Benchmarks: The xnli dataset serves as a benchmark for evaluating NLI models' performance across various languages. It allows researchers to compare the effectiveness of different models and techniques in handling diverse linguistic expressions, enabling advancements in multilingual understanding capabilities
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: el_validation.csv | Column name | Description | |:---------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | premise | The first sentence or text segment that serves as the basis for the natural language inference task. (Text) | | hypothesis | The second sentence or text segment that is compared to the premise to determine the logical relationship between them. (Text) ...
Facebook
Twitterhttps://fedoraproject.org/wiki/Licensing/BSD_with_Attributionhttps://fedoraproject.org/wiki/Licensing/BSD_with_Attribution
Dataset containing scanned historical measurement table documents from ship logs and land measurement stations. Annotations provided in this dataset are designed to allow finergrained table detection and table structure recognition models to be trained and tested. Annotations are region boundaries for tables, cells, headings, headers and captions.
This dataset release includes code to train models on a training split, to use trained model checkpoints for inference and to evaluate interred results on a test split. Pretrained models used in the published HIP-2021 paper are included in the dataset so results can be easily reproduced without training the model checkpoints yourself.
Instructions and code can be found on the linked github repository https://github.com/stuartemiddleton/glosat_table_dataset
A pre-print of the HIP-2021 paper can be found on the authors website https://www.southampton.ac.uk/~sem03/HIP_2021.pdf
Original images sourced with permission from UK Met Office, US NOAA and weatheerrescue.org (University of Reading).
This work is part of the GloSAT project https://www.glosat.org/ and supported by the Natural Environment Research Council (NE/S015604/1). The authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains the complete intersection Calabi-Yau four-folds (CICY4) configuration matrices and their four Hodge numbers, designed for the problem of machine learning the Hodge numbers using the configuration matrices as inputs to a neural network model.
The original data for CICY4 is from the paper: "Topological Invariants and Fibration Structure of Complete Intersection Calabi-Yau Four-Folds", arXiv:1405.2073. and can be downloaded in either text or Mathematica format from: https://www-thphys.physics.ox.ac.uk/projects/CalabiYau/Cicy4folds/index.html
The full CICY4 data included with this dataset in npy format (conf.npy, hodge.npy, direct.npy) is created by running the script 'create_data.py' from https://github.com/robin-schneider/cicy-fourfolds. Given this full data, the following two additional datasets at 72% and 80% training ratios were created.
At 72% data split, - The train dataset consists of the files (conf_Xtrain.npy, hodge_ytrain.npy) - The validation dataset consists of the files (conf_Xvalid.npy, hodge_yvalid.npy) - The test dataset consists of the files (conf_Xtest.npy, hodge_ytest.npy)
At 80% data split, the 3 datasets are: - (conf_Xtrain_80.npy, hodge_ytrain_80.npy) - (conf_Xvalid.npy, hodge_yvalid.npy) - (conf_Xtest_80.npy, hodge_ytest_80.npy) The new train and test sets were formed from the old ones: The old test set is divided into 2 parts with the ratio (0.6, 0.4). The 0.6-partition becomes the new test set, the 0.4-partition is merged with the old train set to form the new train set.
Trained neural networks models and their training/validation losses - 12 models were trained on the 72% dataset and their checkpoints are stored in the folder 'trained_models'. The 12 csv files containing the train+validation losses of these models are stored in the folder 'train-validation-losses'. - At 80% data split, the top 3 performing models trained on the 72% dataset were retrained and their checkpoints are stored in 'trained_models_80pc_split', together with the 3 csv files containing the loss values during the training phase.
Inference notebook: The inference notebook using this dataset is https://www.kaggle.com/code/lorresprz/cicy4-training-results-inference-all-models
Publication: This dataset was created for the work: Deep Learning Calabi-Yau four folds with hybrid and recurrent neural network architectures, https://arxiv.org/abs/2405.17406
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Release of the experimental data from the paper Towards Linking Graph Topology to Model Performance for Biomedical Knowledge Graph Completion (accepted at Machine Learning for Life and Material Sciences workshop @ ICML2024).
(h,r,?) is scored against all entities in the KG and we compute the rank of the score of the correct completion (h,r,t) , after masking out scores of other (h,r,t') triples contained in the graph.experimental_data.zip, the following files are provided for each dataset:{dataset}_preprocessing.ipynb: a Jupyter notebook for downloading and preprocessing the dataset. In particular, this generates the custom label->ID mapping for entities and relations, and the numerical tensor of (h_ID,r_ID,t_ID) triples for all edges in the graph, which can be used to compute graph topological metrics (e.g., using kg-topology-toolbox) and compare them with the edge prediction accuracy.test_ranks.csv: csv table with columns ["h", "r", "t"] specifying the head, relation, tail IDs of the test triples, and columns ["DistMult", "TransE", "RotatE", "TripleRE"] with the rank of the ground-truth tail in the ordered list of predictions made by the four models;entity_dict.csv: the list of entity labels, ordered by entity ID (as generated in the preprocessing notebook);relation_dict.csv: the list of relation labels, ordered by relation ID (as generated in the preprocessing notebook).The separate top_100_tail_predictions.zip archive contains, for each of the test queries in the corresponding test_ranks.csv table, the IDs of the top-100 tail predictions made by each of the four KGE models, ordered by decreasing likelihood. The predictions are released in a .npz archive of numpy arrays (one array of shape (n_test_triples, 100) for each of the KGE models).
All experiments (training and inference) have been run on Graphcore IPU hardware using the BESS-KGE distribution framework.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset contains two parts: the original Stanford Natural Language Inference (SNLI) dataset with automatic translations to Czech, for some items from the SNLI, it contains annotation of the Czech content and explanation.
The Czech SNLI data contain both Czech and English pairs premise-hypothesis. SNLI split into train/test/dev is preserved.
The explanation dataset contains batches of pairs premise-hypothesis. Each batch contains 1499 pairs. Each pair contains:
Example record:
CSNLI ID: 4857558207.jpg#4r1e English premise: A mother holds her newborn baby. English hypothesis: A person holding a child. English gold label: entailment Czech premise: Matka drží své novorozené dítě. Czech hypothesis: Osoba, která drží dítě. Czech gold label: Entailment Explanation-hypothesis: Matka Explanation-premise: Osoba Explanation-relation: generalization
Size of the explanations dataset: - train: 159650 - dev: 2860 - test: 2880
Inter-Annotator Agreement (IAA) Packages 1 and 12 annotate the same data. The IAA measured by the kappa score is 0.67 (substantial agreement).
The translation was performed via LINDAT translation service. Next, the translated pairs were manually checked (without access to the original English gold label), with possible check of the original pair.
Explanations were annotated as follows: - if there is a part of the premise or hypothesis that is relevant for the annotator's decision, it is marked - if there are two such parts and there exists a relation between them, the relation is marked
Possible relation types: - generalization: white long skirt - skirt - specification: dog - bulldog - similar: couch - sofa - independence: they have no instruments - they belong to the group - exclusion: man - woman
Original SNLI dataset: https://nlp.stanford.edu/projects/snli/ LINDAT Translation Service: https://lindat.mff.cuni.cz/services/translation/
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SI-NLI-en is an English translation of the SI-NLI Slovene Natural Language Inference Dataset (http://hdl.handle.net/11356/1707). The English version was compiled by first using machine translation (DeepL) to translate all the premises and hypotheses from SI-NLI into English. The machine translations were then manually checked and corrected by a group of 7 students of translation at the University of Ljubljana. Each translator was given both the Slovene premise and all its hypotheses as well as the translations of both the premise and the hypotheses, so the translations were not checked in isolation, but as units to ensure maximum semantic coherence.
Just like SI-NLI, SI-NLI-en contains 5,937 sentence pairs (premise and hypothesis) that are manually labeled with the labels "entailment", "contradiction", and "neutral". The dataset is split into train, validation, and test sets, with sizes of 4,392, 547, and 998.
The dataset is released in a tabular TSV format. The 00README.txt file contains a description of the attributes. Only the hypothesis and premise are provided in the test set (with no annotations) since SI-NLI-en is integrated into the Slovene evaluation framework SloBENCH (https://slobench.cjvt.si/). If you use the dataset to train your models, please consider submitting the test set predictions to SloBENCH to get the evaluation score and see how it compares to others.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The scripts and the data provided in this depository demonstrate how to apply the approach described in the paper "Common to rare transfer learning (CORAL) enables inference and prediction for a quarter million rare Malagasy arthropods" by Ovaskainen et al. Here we summarize how to use the software with a small, simulated dataset, with running time less than a minute in a typical laptop (Demo 1); (2) how to apply the analyses presented in the paper for a small subset of the data, with running time of ca. one hour in a powerful laptop (Demo 2); how to reproduce the full analyses presented in the paper, with running time up to several days, depending on the computational resources (Demo 3). The Demos 1 and 2 are aimed to be user-friendly starting points for understanding and testing how to implement CORAL. The Demo 3 is included mainly for reproducibility.
System requirements
· The software can be used in any operating system where R can be installed.
· We have developed and tested the software in a windows environment with R version 4.3.1.
· Demo 1 requires the R-packages phytools (2.1-1), MASS (7.3-60), Hmsc (3.3-3), pROC (1.18.5) and MCMCpack (1.7-0).
· Demo 2 requires the R-packages phytools (2.1-1), MASS (7.3-60), Hmsc (3.3-3), pROC (1.18.5) and MCMCpack (1.7-0).
· Demo 3 requires the R-packages phytools (2.1-1), MASS (7.3-60), Hmsc (3.3-3), pROC (1.18.5) and MCMCpack (1.7-0), jsonify (1.2.2), buildmer (2.11), colorspace (2.1-0), matlib (0.9.6), vioplot (0.4.0), MLmetrics (1.1.3) and ggplot2 (3.5.0).
· The use of the software does not require any non-standard hardware.
Installation guide
· The CORAL functions are implemented in Hmsc (3.3-3). The software that applies the is presented as a R-pipeline and thus it does not require any installation other than installation of R.
Demo 1: Software demo with simulated data
The software demonstration consists of two R-markdown files:
· D01_software_demo_simulate_data. This script creates a simulated dataset of 100 species on 200 sampling units. The species occurrences are simulated with a probit model that assumes phylogenetically structured responses to two environmental predictors. The pipeline saves all the data needed to data analysis in the file allDataDemo.RData: XData (the first predictor; the second one is not provided in the dataset as it is assumed to remain unknown for the user), Y (species occurrence data), phy (phylogenetic tree), studyDesign (list of sampling units). Additionally, true values used for data generation are save in the file trueValuesDemo.RData: LF (the second environmental predictor that will be estimated through a latent factor approach), and beta (species responses to environmental predictors).
· D02_software_demo_apply_CORAL. This script loads the data generated by the script D01 and applies the CORAL approach to it. The script demonstrates the informativeness of the CORAL priors, the higher predictive power of CORAL models than baseline models, and the ability of CORAL to estimate the true values used for data generation.
Both markdown files provide more detailed information and illustrations. The provided html file shows the expected output. The running time of the demonstration is very short, from few seconds to at most one minute.
Demo 2: Software demo with a small subset of the data used in the paper
The software demonstration consists of one R-markdown file:
MA_small_demo. This script uses the CORAL functions in HMSC to analyze a small subset of the Malagasy arthropod data. In this demo, we define rare species as those with prevalence at least 40 and less than 50, and common species as those with prevalence at least 200. This leaves 51 species to the backbone model and 460 rare species modelled through the CORAL approach. The script assess model fit for CORAL priors, CORAL posteriors, and null models. It further visualizes the responses of both the common and the rare species to the included predictors.
Scripts and data for reproducing the results presented in the paper (Demo 3)
The input data for the script pipeline is the file “allData.RData”. This file includes the metadata (meta), the response matrix (Y), and the taxonomical information (taxonomy). Each file in the pipeline below depends on the outputs of previous files: they must be run in order. The first six files are used for fitting the backbone HMSC model and calculating parameters for the CORAL prior:
· S01_define_Hmsc_model - defines the initial HMSC model with fixed effects and sample- and site-level random effects.
· S02_export_Hmsc_model - prepares the initial model for HPC sampling for fitting with Hmsc-HPC. Fitting of the model can be then done in an HPC environment with the bash file generated by the script. Computationally intensive.
· S03_import_posterior – imports the posterior distributions sampled by the initial model.
· S04_define_second_stage_Hmsc_model - extracts latent factors from the initial model and defines the backbone model. This is then sampled using the same S02 export + S03 import scripts. Computationally intensive.
· S05_visualize_backbone_model – check backbone model quality with visual/numerical summaries. Generates Fig. 2 of the paper.
· S06_construct_coral_priors – calculate CORAL prior parameters.
The remaining scripts evaluate the model:
· S07_evaluate_prior_predictionss – use the CORAL prior to predict rare species presence/absences and evaluate the predictions in terms of AUC. Generates Fig. 3 of the paper.
· S08_make_training_test_split – generate train/test splits for cross-validation ensuring at least 40% of positive samples are in each partition.
· S09_cross-validate – fit CORAL and the baseline model to the train/test splits and calculate performance summaries. Note: we ran this once with the initial train/test split and then again with on the inverse split (i.e., training = ! training in the code, see comment). The paper presents the average results across these two splits. Computationally intensive.
· S10_show_cross-validation_results – Make plots visualizing AUC/Tjur’s R2 produced by cross-validation. Generates Fig. 4 of the paper.
· S11a_fit_coral_models – Fit the CORAL model to all 250k rare species. Computationally intensive.
· S11b_fit_baseline_models – Fit the baseline model to all 250k rare species. Computationally intensive.
· S12_compare_posterior_inference – compare posterior climate predictions using CORAL and baseline models on selected species, as well as variance reduction for all species. Generates Fig. 5 of the paper.
Pre-processing scripts:
· P01_preprocess_sequence_data.R – Reads in the outputs of the bioinformatics pipeline and converts them into R-objects.
· P02_download_climatic_data.R – Downloads the climatic data from "sis-biodiversity-era5-global” and adds that to metadata.
· P03_construct_Y_matrix.R – Converts the response matrix from a sparse data format to regular matrix. Saves “allData.RData”, which includes the metadata (meta), the response matrix (Y), and the taxonomical information (taxonomy).
Computationally intensive files had runtimes of 5-24 hours on high-performance machines. Preliminary testing suggests runtimes of over 100 hours on a standard laptop.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
AutoGEO-Researchy Dataset
This dataset contains multiple configurations for different tasks. Use the dropdown menu above to select a specific configuration to view.
main: Contains the primary train and test splits. rule_candidate: Data for rule candidate generation. cold_start: Data for cold-start finetuning. inference: Data for inference tasks. grpo_input: Input data for GRPO. grpo_eval: Evaluation data for GRPO.
Facebook
TwitterResearchy-GEO Dataset
This dataset contains multiple configurations for different tasks. Use the dropdown menu above to select a specific configuration to view.
main: Contains the primary train and test splits. rule_candidate: Data for rule candidate generation. cold_start: Data for cold-start finetuning. inference: Data for inference tasks. grpo_input: Input data for GRPO. grpo_eval: Evaluation data for GRPO.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Recognizing lexical inference is an essential component in natural language understanding. In question answering, for instance, identifying that broadcast and air are synonymous enables answering the question "When was 'Friends' first aired?" given the text "'Friends' was first broadcast in 1994". Semantic relations such as synonymy (tall, high) and hypernymy (cat, pet) are used to infer the meaning of one term from another, in order to overcome lexical variability. This inference should typically be performed within a given context, considering both the term meanings in context and the specific semantic relation that holds between the terms.
This dataset provides annotations for fine-grained lexical inferences in-context. The dataset consists of 3,750 term pairs, each given within a context sentence, built upon a subset of terms from PPDB. Each term pair is annotated to the semantic relation that holds between the terms in the given contexts.
Files:
File Structure: comma-separated file
Fields:
If you use this dataset, please cite the following paper:
Adding Context to Semantic Data-Driven Paraphrasing.
Vered Shwartz and Ido Dagan. *SEM 2016.
I hope that this dataset will motivate the development of context-sensitive lexical inference methods, which have been relatively overlooked, although they are crucial for applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of bounding boxes for each sedimentary structure across training and test sets in Split-I. This table highlights class representation balance to ensure effective model training and evaluation.
Facebook
TwitterNote: To better find the files to download, select "Change View: Tree". The dataset contains: 80 video sequences from conventional pig farming with multi-object tracking annotations together with a 'split.txt' file containing the predefined training, validation and test splits The original mp4 videos of the 80 video sequences A visualization of the annotated bounding boxes for all 80 videos Model weights of MOTRv2 and MOTIP trained for pig tracking. Pre-computed bounding box priors that can be used to train MOTRv2. A thorough explanation of all files contained in this data repository can be found in ReadMe.txt. The github repository associated with this dataset can be found at https://github.com/jonaden94/PigBench. It includes commands to automatically download the files from this data repository that are required for model training, evaluation, and inference.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Response Score Dataset (RSD) is the first comprehensive multimodal response quality dataset specifically designed for training and evaluating Vision-Language Model (VLM) routers in edge-cloud collaborative systems. This dataset enables scenario-aware routing between large cloud models and small edge models, optimizing the trade-off between response quality, inference latency, and computational cost.
📦 Total Samples: ~22,700 image-text pairs
🤖 Models Evaluated: 8 VLMs (2 Large + 3 Medium+ 2 small)
📚 Source Benchmarks: 7 public VLM datasets
⭐ Score Range: 1-10 (LLM-as-a-Judge)
✅ Human Validation: 200 samples (r=0.88 correlation)
💰 Construction Cost: ~$1,000 USD
| Dataset | Samples | Difficulty | Task Type |
|---|---|---|---|
| ChartQA | 2,500 | Easy | Chart understanding & arithmetic |
| WildVision | 500 | Easy | Real-world open-ended VQA |
| GQA | 12,000 | Medium | Compositional spatial reasoning |
| VizWiz | 4,319 | Medium | Blind-assistance with noise |
| MMVet | 218 | Medium | Multi-ability composite tasks |
| MMMU-Pro | 1,730 | Hard | Professional domain knowledge |
| MMStar | 1,500 | Hard | Leak-resistant fine-grained eval |
| Total | ~22,700 | Mixed | Diverse multimodal tasks |
Large Models (LVLM - Cloud Deployment): - Gemma 3-27B - InternVL3-38B
Small Models (SVLM - Edge Deployment): - InternVL3-8B - Phi-4-Multimodal-5.6B - Qwen2.5-VL-7B - InternVL2.5-2B - InternVL2.5-1B - SmolVLM-256M
vlm_evaluation_dataset/
├── images/ # Original images for each sub-dataset
│ ├── MMVet/
│ ├── ChartQA_TEST/
│ ├── GQA_TestDev_Balanced/
│ ├── MMMU/
│ ├── MMStar/
│ ├── VizWiz/
│ └── WildVision/
│
├── metadata/ # Metadata files for each dataset (TSV format)
│ ├── MMVet.tsv
│ ├── ChartQA_TEST.tsv
│ ├── GQA_TestDev_Balanced.tsv
│ ├── MMMU.tsv
│ ├── MMStar.tsv
│ ├── VizWiz.tsv
│ └── WildVision.tsv
│
├── scoring_results/ # Model prediction and scoring results
│ ├── MMVet/
│ │ ├── InternVL3-8B/
│ │ │ └── single/
│ │ │ ├── results.csv # Aggregated scoring results
│ │ │ ├── details.json # Detailed reasoning and scoring records
│ │ │ └── log.json # Model inference logs (optional)
│ │ └── OtherModel/
│ │ └── single/
│ ├── ChartQA_TEST/
│ │ └── ...
│ └── ...
│
├── statistics.json # Dataset statistics summary (sample counts, category distribution, etc.)
└── README.md # Overall dataset documentation
| Field Name | Description |
|---|---|
index | Unique sample ID |
image | Image path or Base64 encoding |
question | Input question text |
answer | Reference answer |
category | Question category (e.g., visual reasoning, chart understanding, etc.) |
| Field Name | Description |
|---|---|
question_id | Question ID (corresponds to metadata.index) |
question | Question text |
reference_answer | Ground truth answer |
prediction | Model predicted answer |
score | LLM score (range 0-10) |
reasoning | Scoring rationale (text description) |
model_name | Model name (e.g., InternVL3-8B) |
category | Question category |
dataset_type | Dataset name (e.g., MMVet) |
inference_time | Model inference time (unit: seconds) |
Overall (all models, all samples):
├── Mean: 5.58
├── Median: 6.00
├── Std Dev: 2.15
├── Min: 1.00
├── Max: 10.00
└── Mode: 6.00
By Model Type:
├── Large Models (LVLM): Mean = 5.81
└── Small Models (SVLM): Mean = 5.47
By Difficulty:
├── Easy: Mean = 7.13
├── Medium: Mean = 5.80
└── Hard: Mean = 3.36
Overall:
├── Mean: 1.31s
├── Median: 0.60s
├── P75: 1.17s
├── P90: 2.52s
└── P99: 5.45s
By Model:
├── SmolVLM-256M: 0.62s (fastest)
├── InternVL2.5-1B: 0.71s
├── InternVL2.5-2B: 0.81s
├── InternVL3-8B: 0.92s
├── Phi-4-5.6B: 1.80s
├── Qwen2.5-VL-7B: 0.90s
├── InternVL3-38B: 2.47s
└── Gemma3-27B: 2.56s (slowest)
graph TD
A[7 Public Benchmarks] --> B[Sample Collection ~22k]
B --> C[8 VLM Inference]
C --> D[Response Generation]
D --> E[LLM-as-a-Judge Scoring]
E --> F[Human Validation 200 samples]
F --> G[Quality Check r>0.85]
G --> H[MES-based Labeling]
H --> I[Stratified Train/Val/Test Split]
I --> J[Final RSD Dataset]
``...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Paper: link
The ANLI Adversarial Natural Language Inference dataset is a new, large-scale NLI benchmark dataset. The dataset is collected via an iterative, adversarial human-and-model-in-the-loop procedure. ANLI is much more difficult than its predecessors such as SNLI and MNLI. It contains three rounds. Each round has train/dev/test splits. The data fields are the same among all splits.
ANLI provides a unique challenge for natural language understanding models. The dataset is collected via an iterative, adversarial human-and-model-in-the loop procedure that makes it much more difficult than its predecessors such as SNLI and MNLI. This makes ANLI a great benchmark to assess the progress of NLI models
To use the ANLI dataset, you will need to download the train_r1.csv file. This file contains the data for the first round of training data for the ANLI dataset. Next, you will need to download the dev_r1.csv file. This file contains the data for the first round of development data for the ANLI dataset. Finally, you will need to download the test_r1.csv file. This file contains the data for the first round of testing in the ANLI dataset
- The ANLI Adversarial Natural Language Inference dataset can be used to train models to better understand natural language.
- The dataset can be used to develop models that are more robust to adversarial examples.
- The dataset can be used to improve the accuracy of NLI systems
The dataset was originally published on Huggingface Hub
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: dev_r2.csv | Column name | Description | |:---------------|:-----------------------------------------| | premise | The premise of the sentence. (String) | | hypothesis | The hypothesis of the sentence. (String) | | label | The label of the sentence. (String) | | reason | The reason for the label. (String) |
File: test_r2.csv | Column name | Description | |:---------------|:-----------------------------------------| | premise | The premise of the sentence. (String) | | hypothesis | The hypothesis of the sentence. (String) | | label | The label of the sentence. (String) | | reason | The reason for the label. (String) |
File: train_r3.csv | Column name | Description | |:---------------|:-----------------------------------------| | premise | The premise of the sentence. (String) | | hypothesis | The hypothesis of the sentence. (String) | | label | The label of the sentence. (String) | | reason | The reason for the label. (String) |
File: dev_r3.csv | Column name | Description | |:---------------|:-----------------------------------------| | premise | The premise of the sentence. (String) | | hypothesis | The hypothesis of the sentence. (String) | | label | The label of the sentence. (String) | | reason | The reason for the label. (String) |
File: test_r3.csv | Column name | Description | |:---------------|:-----------------------------------------| | premise | The premise of the sentence. (String) | | hypothesis | The hypothesis of the sentence. (String) | | label | The label of the sentence. (String) | | reason | The reason for the label. (String) |
File: train_r2.csv | Column name | Description | |:---------------|:-----------------------------------------| | premise | The premise of the sentence. (String) | | hypothesis | The hypothesis of the sentence. (String) | | label | The label of the sentence. (String) | | reason | The reason for the label. (String) |
File: train_r1.csv | Column name | Description | |:---------------|:-----------------------------------------| | premise | The premise of the sentence. (String) | | hypothesis | The hypothesis of the...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of bounding boxes for each sedimentary structure across training and test sets in Split-III. It confirms that all classes are represented, supporting fair performance evaluation despite observed precision drops.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 164K images.
This is the original version from 2014 made available here for easy access in Kaggle and because it does not seem to be still available on the COCO Dataset website. This has been retrieved from the mirror that Joseph Redmon has setup on this own website.
The 2014 version of the COCO dataset is an excellent object detection dataset with 80 classes, 82,783 training images and 40,504 validation images. This dataset contains all this imagery on two folders as well as the annotation with the class and location (bounding box) of the objects contained in each image.
The initial split provides training (83K), validation (41K) and test (41K) sets. Since the split between training and validation was not optimal in the original dataset, there is also two text (.part) files with a new split with only 5,000 images for validation and the rest for training. The test set has no labels and can be used for visual validation or pseudo-labelling.
This is mostly inspired by Erik Linder-Norén and [Joseph Redmon](https://pjreddie.com/darknet/yolo
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains 1,784 drone images of various construction types and nature. It is splitted into three sets necessary for Machine Learning and Deep Learning tasks, namely train, validation, and test splits. The structure of the data is as follows:
ROOT
train
valid
test
There are 1,427, 179, and 178 images in the train, validation, and test folders. The train and validation folders have specific classes but the test set images has no classes and must be predicted using an AI model during inference.
Facebook
TwitterAI-Powered Acne Detection using Ensemble Deep Learning This project presents an AI-based acne detection and severity assessment system that combines two deep learning models to analyze facial images. The approach integrates a classification model (ResNet50) and a localization model (YOLOv5) to provide both an overall severity prediction and detailed detection of individual acne lesions.
The classification module predicts the severity of acne on the entire face and estimates the number of lesions present in each image. It uses KL divergence loss to improve training stability and outputs confidence scores for each prediction. The localization module, based on YOLOv5, detects the exact positions of acne lesions and classifies them into six types: comedones, papules, pustules, nodules, cysts, and scars. It uses bounding boxes with configurable confidence thresholds and supports real-time detection.
The dataset used in this project contains 920 high-resolution facial images, annotated with 2,847 total lesions. The dataset is split into 637 training images, 194 validation images, and 89 test images. Annotations follow the YOLO format, and the class distribution is balanced across all acne types.
The system evaluates performance using standard classification metrics such as accuracy, precision, recall, and F1-score, and uses mAP and IoU for object detection. The ensemble results are generated by combining the outputs of both models, which helps improve accuracy and reliability.
This solution is implemented using Python 3.8 with the PyTorch framework. It requires YOLOv5 for detection and ResNet50 for classification. GPU acceleration with CUDA is recommended for training and inference.
The codebase includes a complete pipeline for training, validation, testing, and inference. It is modular and easy to extend. Jupyter notebook examples are provided for quick experimentation and visualization.
This project is suitable for various use cases including dermatology research, telemedicine, skincare applications, and educational tools. It demonstrates the value of combining classification and object detection models for practical medical image analysis.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted on GitHub. Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset is also de-duplicated using the CD4Py tool. The list of duplicate files is provided in duplicate_files.txt file. All of its Python projects are processed in JSON-formatted files. They contain a seq2seq representation of each file, type-related hints, and information for machine learning models. The structure of JSON-formatted files is described in JSONOutput.md file. The dataset is split into train, validation and test sets by source code files. The list of files and their corresponding set is provided in dataset_split.csv file. Notable changes to each version of the dataset are documented in CHANGELOG.md.