23 datasets found

h
options-IV-SP500
huggingface.co
Updated Oct 14, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Pablo (2019). options-IV-SP500 [Dataset]. https://huggingface.co/datasets/gauss314/options-IV-SP500
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2019
Authors
Juan Pablo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Downloading the Options IV SP500 Dataset

This document will guide you through the steps to download the Options IV SP500 dataset from Hugging Face Datasets. This dataset includes data on the options of the S&P 500, including implied volatility. To start, you'll need to install Hugging Face's datasets library if you haven't done so already. You can do this using the following pip command: !pip install datasets

Here's the Python code to load the Options IV SP500 dataset from Hugging… See the full description on the dataset page: https://huggingface.co/datasets/gauss314/options-IV-SP500.
h
MIP
huggingface.co
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosetta Commons (2025). MIP [Dataset]. https://huggingface.co/datasets/RosettaCommons/MIP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2025
Dataset authored and provided by
Rosetta Commons
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Microbiome Immunity Project: Protein Universe

~200,000 predicted structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis.

Quickstart Usage Install HuggingFace Datasets package

Each subset can be loaded into python using the Huggingface datasets library. First, from the command line install the datasets library $ pip install datasets

Optionally set the… See the full description on the dataset page: https://huggingface.co/datasets/RosettaCommons/MIP.
h
CF-MS_Homo_sapiens_PPI
huggingface.co
Updated Nov 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miles Woodcock-Girard (2024). CF-MS_Homo_sapiens_PPI [Dataset]. https://huggingface.co/datasets/viridono/CF-MS_Homo_sapiens_PPI
Explore at:
Dataset updated
Nov 30, 2024
Authors
Miles Woodcock-Girard
Description
Quickstart Usage

This dataset can be loaded into python using the Huggingface datasets library. First, install the datasets library via command line: $ pip install datasets

With datasets installed, the user should then import it into their python script / environment:

import datasets

The user can then load the CF-MS_Homo_sapiens_PPI dataset using datasets.load_dataset(...). There are two configurations, or 'views' for the set. The user can choose between them via the name… See the full description on the dataset page: https://huggingface.co/datasets/viridono/CF-MS_Homo_sapiens_PPI.
PersianQA
kaggle.com
zip
Updated Apr 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sajjad Ayoubi (2021). PersianQA [Dataset]. https://www.kaggle.com/datasets/sajjadayobi360/persianqa/code
Explore at:
zip(935739 bytes)Available download formats
Dataset updated
Apr 29, 2021
Authors
Sajjad Ayoubi
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
PersianQA: a dataset for Persian Question Answering

Persian Question Answering (PersianQA) Dataset is a reading comprehension dataset on Persian Wikipedia. The crowd-sourced dataset consists of more than 9,000 entries. Each entry can be either an impossible-to-answer or a question with one or more answers spanning in the passage (the context) from which the questioner proposed the question. Much like the SQuAD2.0 dataset, the impossible or unanswerable questions can be utilized to create a system which "knows that it doesn't know the answer".

Moreover, the dataset has 900 test data available. On top of that, the very first models trained on the dataset, Transformers, are available online.

All the crowdworkers of the dataset are native Persian speakers. Also, it worth mentioning that the contexts are collected from all categories of the Wiki (Historical, Religious, Geography, Science, etc).

At the moment, each context has 7 pairs of questions with one answer and 3 impossible questions.

Dataset

Access and Download

You can find the dataset under the dataset directory and use it like below:

import read_qa # is avalible at src/read_ds.py train_ds = read_qa('pqa_train.json') test_ds = read_qa('pqa_test.json')

Alternatively, you can also access the data through the HuggingFace🤗 datasets library. For that, you need to install datasets using this command in your terminal:

pip install -q datasets

Afterwards, import persian_qa dataset using load_dataset:

from datasets import load_dataset dataset = load_dataset("SajjadAyoubi/persian_qa")

Statistic

Split # of instances # of unanswerables avg. question length avg. paragraph length avg. answer length
Train 9,000 2,700 8.39 224.58 9.61
Test 938 280 8.02 220.18 5.99

The lengths are on the token level.

To learn more about the data and more examples take a look here.

Models

Currently, two models (baseline) on HuggingFace🤗 model hub are using the dataset. The models are listed in the table below.

Citation

As of yet, we didn't publish any papers on the work. However, if you did, please cite us properly with an entry like the one below. bibtex @misc{PersianQA, author = {Ayoubi, Sajjad \& Davoodeh, Mohammad Yasin}, title = {PersianQA: a dataset for Persian Question Answering}, year = 2021, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/SajjjadAyobi/PersianQA}}, }
h
punkt
huggingface.co
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kailin Jiang, 蒋凯林 (2024). punkt [Dataset]. https://huggingface.co/datasets/kailinjiang/punkt
Explore at:
Dataset updated
Jul 5, 2024
Authors
Kailin Jiang, 蒋凯林
Description
🛠️ Requirements and Installation

clone ICE

git clone https://github.com/Yofuria/ICE.git cd ICE

create conda env

conda create -n ICE python=3.10 conda activate ICE

install package

pip install -r requirements.txt

In lines 32 and 33 of examples/run_knowedit_llama2.py, you need to download the punkt package.

If your Internet speed is fast enough, you can run the code directly from the command line.

if name == "_main_": # If you have a slow Internet connection and… See the full description on the dataset page: https://huggingface.co/datasets/kailinjiang/punkt.
h
AttentiveSkin
huggingface.co
Updated Aug 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maom Lab (2025). AttentiveSkin [Dataset]. https://huggingface.co/datasets/maomlab/AttentiveSkin
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 18, 2025
Dataset authored and provided by
Maom Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Attentive Skin

To Predict Skin Corrosion/Irritation Potentials of Chemicals via Explainable Machine Learning Methods Download: https://github.com/BeeBeeWong/AttentiveSkin/releases/tag/v1.0

Quickstart Usage Load a dataset in python

Each subset can be loaded into python using the Huggingface datasets library. First, from the command line install the datasets library $ pip install datasets

then, from within python load the datasets library

import datasets… See the full description on the dataset page: https://huggingface.co/datasets/maomlab/AttentiveSkin.
z
Complete code and datasets for "ESNLIR: Expanding Spanish NLI Benchmarks...
zenodo.org
bin, pdf, zip
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johan David Rodriguez Portela; Johan David Rodriguez Portela; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán (2025). Complete code and datasets for "ESNLIR: Expanding Spanish NLI Benchmarks with Multi-Genre and Causal Annotation" [Dataset]. http://doi.org/10.5281/zenodo.15002575
Explore at:
bin, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15002575
Dataset updated
Nov 12, 2025
Dataset provided by
Arxiv
Authors
Johan David Rodriguez Portela; Johan David Rodriguez Portela; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ESNLIR: Expanding Spanish NLI Benchmarks with Multi-Genre and Causal Annotation

This is the complete code, model and datasets for the article ESNLIR: Expanding Spanish NLI Benchmarks with Multi-genre and Causal Annotation

In case you cannot access the article this preprint is available: ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships.

How to cite:

Portela, J.R., Pérez-Terán, N., Manrique, R. (2026). ESNLIR: Expanding Spanish NLI Benchmarks with Multi-genre and Causal Annotation. In: Florez, H., Peluffo-Ordoñez, D. (eds) Applied Informatics. ICAI 2025. Communications in Computer and Information Science, vol 2667. Springer, Cham. https://doi.org/10.1007/978-3-032-07175-0_23

IMPORTANT UPDATE!!!

It is strongly advised to work with the following links, instead of working directly from Zenodo:

CODE REPOSITORY: This repository contains the code used for the article.

SMALL EXAMPLE REPOSITORY: This repository contains a small code example showing you how to train, and predict using a very small toy dataset, with the same structure.

HUGGING FACE COLLECTION: Huggingface collection containing the dataset and models.

If you still want to use the Zenodo repository, follow the steps below. But once again, it is way easier to work with the links above.

----------------------------------------------------------------------------------------------

Installation

This repository is a poetry project, which means that it can be installed easily by executing the following command from a shell in the repository folder:

poetry install

As this repository is script based, the README.md file contains all the commands executed to generate the dataset and train models.

----------------------------------------------------------------------------------------------

Core code

The core code used for all the experiments is in the folder auto-nli and all the calls to the core code with the parameters requested are found in README.md

----------------------------------------------------------------------------------------------

Parameters

All the parameters to create datasets and train models with the core code are found in the folder parameters.

----------------------------------------------------------------------------------------------

Models

Model types

For BERT based models, all in pytorch, there are two types of models from huggingfaces that were used for training and also are required to load a dataset because of the tokenizer:

RoBERTa (BERTIN): https://huggingface.co/bertin-project/bertin-roberta-base-spanish

XLMRoBERTa: https://huggingface.co/FacebookAI/xlm-roberta-base

Model folder

The model folder contains all the trained models for the paper. There are three types of models:

baseline: An XGBoost model that can be loaded with pickle.

roberta: BERTIN based models in pytorch. You can load them with the model_path

xlmroberta: XLMRoBERTa based models in pytorch. You can load them with the model_path

Models with the suffix _annot are models trained with the premise (first sentence) only. Apart from the pytorch model folder, each model result folder (ex: ) contains the test results for the test set and the stress test sets (ex: )

Load model

Models are found in the folder model and all of them are pytorch models which can be loaded with the huggingface interface:

from transformers import AutoModel model = AutoModel.from_pretrained('

----------------------------------------------------------------------------------------------

Dataset

labeled_final_dataset.jsonl

This file is included outside the ZIP containing all other files, and it contains the final test dataset with 974 examples selected by human majority label matching the original linking phrase label.

Other datasets:

The datasets can be found in the folder data that is divided in the following folders:

base_dataset

The splits to train, validate and test the models.

splits_data

Splits of train-val-test extracted for each corpora. They are used to generate base_dataset.

sentence_data

Pairs of sentences found in each corpus. They are used to generate splits_data.

Dataset dictionary

This repository contains the splits that resulted from the research project "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships". All the splits are in JSONL format and have the same fields per example:

sentence_1: First sentence of the pair.

sentence_2: Second sentence of the pair.

connector: Linking phrase used to extract pair.

connector_type: NLI label, between "contrasting", "entailment", "reasoning" or "neutral"

extraction_strategy: "linking_phrase" for "contrasting", "entailment", "reasoning" and "none" for neutral.

distance: How many sentences before the connector is the sentence_1

sentence_1_position: Number of sentence for sentence_1 in the source document

sentence_1_paragraph: Number of paragraph for sentence_1 in the source document

sentence_2_position: Number of sentence for sentence_2 in the source document

sentence_2_paragraph: Number of paragraph for sentence_2 in the source document

id: Unique identifier for the example

dataset: Source corpus of the pair. Metadata of corpus, including source can be found in dataset_metadata.xlsx.

genre: Writing genre of the dataset.

domain: Domain genre of the dataset.

Example:

{"sentence_1":"sefior Bcajavides no es moderado, tampoco lo convertirse e\u00f1 declarada divergencia de miras polileido en griego","sentence_2":"era mayor claricomentarios, as\u00ed de los peri\u00f3dicos como de los homes dado \u00e1 la voluntad de los hombres, sin que sobreticas","connector":"por consiguiente,","connector_type":"reasoning","extraction_strategy":"linking_phrase","distance":1.0,"sentence_1_paragraph":4,"sentence_1_position":86,"sentence_2_paragraph":4,"sentence_2_position":87,"id":"esnews_spanish_pd_news_531537","dataset":"esnews_spanish_pd_news","genre":"news","domain":"spanish_public_domain_news"}

Dataset load

To load a dataset/split as a pytorch object used to train-validate-test models you must use the custom class dataset

from auto_nli.model.bert_based.dataset import BERTDataset

dataset = BERTDataset(

os.path.join(dataset_folder,

max_len=

model_type=

only_premise=

max_samples=

----------------------------------------------------------------------------------------------

Notebooks

The folder notebooks contains a collection of jupyter notebooks used to preprocess datasets and visualize results.
h
VLM4Bio
huggingface.co
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HDR Imageomics Institute (2025). VLM4Bio [Dataset]. http://doi.org/10.57967/hf/3393
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/3393
Dataset updated
Nov 20, 2025
Dataset authored and provided by
HDR Imageomics Institute
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for VLM4Bio

Instructions for downloading the dataset

Install Git LFS Git clone the VLM4Bio repository to download all metadata and associated files Run the following commands in a terminal:

git clone https://huggingface.co/datasets/imageomics/VLM4Bio cd VLM4Bio

Downloading and processing bird images

To download the bird images, run the following command:

bash download_bird_images.sh

This should download the bird images inside datasets/Bird/images… See the full description on the dataset page: https://huggingface.co/datasets/imageomics/VLM4Bio.
nayjest/Phi-3-mini-4k-instruct
kaggle.com
zip
Updated May 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vitalii Stepanenko (2024). nayjest/Phi-3-mini-4k-instruct [Dataset]. https://www.kaggle.com/datasets/nayjest/phi-3-mini-4k-instruct
Explore at:
zip(6067852377 bytes)Available download formats
Dataset updated
May 9, 2024
Authors
Vitalii Stepanenko
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

Model Summary

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.

The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.

Resources and Technical Documentation:

Phi-3 Microsoft Blog

Phi-3 Technical Report

Phi-3 on Azure AI Studio

Phi-3 GGUF: 4K

Phi-3 ONNX: 4K

Intended Uses

Primary use cases

The model is intended for commercial and research use in English. The model provides uses for applications which require:

1) Memory/compute constrained environments 2) Latency bound scenarios 3) Strong reasoning (especially code, math and logic)

Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.

Use case considerations

Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fariness before using within a specific downstream use case, particularly for high risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case.

Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.

How to Use

Phi-3 Mini-4K-Instruct has been integrated in the development version (4.41.0.dev0) of transformers. Until the official version is released through pip, ensure that you are doing one of the following:

When loading the model, ensure that trust_remote_code=True is passed as an argument of the from_pretrained() function.

Update your local transformers to the development version: pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers. The previous command is an alternative to cloning and installing from the source.

The current transformers version can be verified with: pip list | grep transformers.

Phi-3 Mini-4K-Instruct is also available in HuggingChat.

Tokenizer

Phi-3 Mini-4K-Instruct supports a vocabulary size of up to 32064 tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.

Chat Format

Given the nature of the training data, the Phi-3 Mini-4K-Instruct model is best suited for prompts using the chat format as follows. You can provide the prompt as a question with a generic template as follow: markdown <|user|> Question <|end|> <|assistant|> For example: markdown <|user|> How to explain Internet for a medieval knight?<|end|> <|assistant|>

where the model generates the text after <|assistant|> . In case of few-shots prompt, the prompt can be formatted as the following:

<|user|> I am going to Paris, what should I see?<|end|> <|assistant|> Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris: 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic...
h
Simeonov2008
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haneul Park, Simeonov2008 [Dataset]. https://huggingface.co/datasets/haneulpark/Simeonov2008
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Haneul Park
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Simeonov2008

The Simeonov2008 dataset contains 7,152 compounds in the train set, with high-throughput screening (HTS) results recorded in the "Activity Outcome" column.

Quickstart Usage Load a dataset in python

Each subset can be loaded into python using the Huggingface datasets library. First, from the command line install the datasets library $ pip install datasets

then, from within python load the datasets library

import datasets

and load the… See the full description on the dataset page: https://huggingface.co/datasets/haneulpark/Simeonov2008.
h
s_clips-v1.0-safe
huggingface.co
Updated Sep 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenglin Pan (2024). s_clips-v1.0-safe [Dataset]. https://huggingface.co/datasets/aidenpan/s_clips-v1.0-safe
Explore at:
Dataset updated
Sep 12, 2024
Authors
Zhenglin Pan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
How to use

make sure your environment is set up

pip install datasets

Run the following command to download&load your data

from datasets import load_dataset dataset = load_dataset("aidenpan/s_clips-v1.0-safe")

Print it out

print(dataset["val"]["identifier"])

['137720:2', '221257:7', '159943:2', '124745:14', '179035:9'... ]
h
multimodalpragmatic
huggingface.co
Updated Jun 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tong Liu (2024). multimodalpragmatic [Dataset]. https://huggingface.co/datasets/tongliuphysics/multimodalpragmatic
Explore at:
Dataset updated
Jun 22, 2024
Authors
Tong Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Multimodal Pragmatic Jailbreak on Text-to-image Models

Project page | Paper | Code The Multimodal Pragmatic Unsafe Prompts (MPUP) is a dataset designed to assess the multimodal pragmatic safety in Text-to-Image (T2I) models. It comprises two key sections: image_prompt, and text_prompt.

Dataset Usage Downloading the Data

To download the dataset, install Huggingface Datasets and then use the following command: from datasets import load_dataset dataset =… See the full description on the dataset page: https://huggingface.co/datasets/tongliuphysics/multimodalpragmatic.
h
relabeled_coco_val
huggingface.co
Updated Apr 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xueqing Deng (2024). relabeled_coco_val [Dataset]. https://huggingface.co/datasets/xdeng77/relabeled_coco_val
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2024
Authors
Xueqing Deng
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
How to download

Set up environment

pip install datasets tqdm wget https://raw.githubusercontent.com/bytedance/coconut_cvpr2024/main/download_coconut.py

Use the download script to download the COCONut dataset splits.

python download_coconut.py # default split: relabeled_coco_val

The above command should print your download status, if you download it successfully you can see the results below:

Download other COCONut dataset splits.

If you want to download the other splits… See the full description on the dataset page: https://huggingface.co/datasets/xdeng77/relabeled_coco_val.
h
OpenMLPerf
huggingface.co
Updated May 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grigori Fursin (2025). OpenMLPerf [Dataset]. https://huggingface.co/datasets/gfursin/OpenMLPerf
Explore at:
Dataset updated
May 26, 2025
Authors
Grigori Fursin
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Preparing OpenMLPerf dataset

To process the semi-raw MLPerf data into the OpenMLPerf dataset, run the following command:

Untar raw files

bzip2 -d semi-raw-mlperf-data.tar.bz2 tar xvf semi-raw-mlperf-data.tar

Create a virtual environment

python -m venv .venv

Activate the virtual environment

source .venv/bin/activate

Install the required packages

pip install -r requirements.txt

Run the processing script

python process.py

The processed dataset will be saved both as… See the full description on the dataset page: https://huggingface.co/datasets/gfursin/OpenMLPerf.
h
jade-cmd
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jade poisk-ls (2025). jade-cmd [Dataset]. https://huggingface.co/datasets/poisk-ls/jade-cmd
Explore at:
Dataset updated
Jun 26, 2025
Authors
Jade poisk-ls
Description
Visit Site!

If you want to run your Termux/Terminal very smoothly and run without any problem, will this command is for you.. #

Install in one click!

sh install.sh / bash install.sh

# or follow the manual process! - Follow the installation process!!

Termux Installation Commands

PKG Command PIP Command termux-change-repo pip install requests

pkg update pip2 install requests

pkg upgrade pip3 install requests

pkg install python pip install mechanize… See the full description on the dataset page: https://huggingface.co/datasets/poisk-ls/jade-cmd.
h
Easy_Galore_8bit_training_With_Native_Windows_Support
huggingface.co
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rombo-Org (2025). Easy_Galore_8bit_training_With_Native_Windows_Support [Dataset]. https://huggingface.co/datasets/Rombo-Org/Easy_Galore_8bit_training_With_Native_Windows_Support
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 5, 2025
Dataset authored and provided by
Rombo-Org
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
SUPPORT ME ON PATREON

https://www.patreon.com/c/Rombodawg

Prerequisites:

Python: https://www.python.org/downloads/ Git: https://git-scm.com/downloads

Instructions:

Make sure python and git are installed Open a command prompt terminal on your local folder In command prompt run

git lfs install

then git clone https://huggingface.co/datasets/Rombo-Org/Easy_Galore_8bit_training_With_Native_Windows_Support

then cd Easy_Galore_8bit_training_With_Native_Windows_Support

Now… See the full description on the dataset page: https://huggingface.co/datasets/Rombo-Org/Easy_Galore_8bit_training_With_Native_Windows_Support.
h
ViTCoT
huggingface.co
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yongheng Zhang (2025). ViTCoT [Dataset]. https://huggingface.co/datasets/BRZ911/ViTCoT
Explore at:
Dataset updated
Jul 15, 2025
Authors
Yongheng Zhang
Description
ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models

📷 This is the code dataset for the paper: ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models. ACM MM 2024.

Preparation steps: environment installation (1) Environment installation command:

python pip install -r requirements.txt

(2) Please fill in the API information in the file:… See the full description on the dataset page: https://huggingface.co/datasets/BRZ911/ViTCoT.
h
UNPC_EN_ZH
huggingface.co
Updated May 15, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziqi Zheng (2016). UNPC_EN_ZH [Dataset]. https://huggingface.co/datasets/ZkiZkiZki/UNPC_EN_ZH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2016
Authors
Ziqi Zheng
Description
Dataset

Download Data

UNPC_EN_ZH You may download the EN.txt and ZH.txt manually. or using git command git lfs install git clone https://huggingface.co/datasets/ZkiZkiZki/UNPC_EN_ZH

Make sure the dataset path is correct data/UNPC_EN_ZH/EN.txt data/UNPC_EN_ZH/ZH.txt

Reference

Ziemski, M., Junczys-Dowmunt, M., and Pouliquen, B., (2016), The United Nations Parallel Corpus, Language Resources and Evaluation (LREC’16), Portorož, Slovenia, May 2016.
h
DocStruct4M_ip
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kwangrok Ryoo, DocStruct4M_ip [Dataset]. https://huggingface.co/datasets/Ryoo72/DocStruct4M_ip
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kwangrok Ryoo
Description
mPLUG/DocStruct4M reformated for VSFT with TRL's SFT Trainer.Referenced the format of HuggingFaceH4/llava-instruct-mix-vsft The dataset uses image paths instead of embedding actual images.To access the images, you'll need to download them from the original mPLUG/DocStruct4M dataset. To download the original images, use the following commands: pip install -U "huggingface_hub[cli]" huggingface-cli download mPLUG/DocStruct4M --repo-type dataset

As specified in the official repo, extract the… See the full description on the dataset page: https://huggingface.co/datasets/Ryoo72/DocStruct4M_ip.
h
natural_reasoning_embedded
huggingface.co
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Sutor (2025). natural_reasoning_embedded [Dataset]. https://huggingface.co/datasets/johnsutor/natural_reasoning_embedded
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2025
Authors
John Sutor
Description
Natural Reasoning Embeddings

This is a dataset containing the embeddings for the Natural Reasoning dataset, using the same Embedding Model as the original paper. The code that created these embeddings is below

/usr/bin/env python3

-*- coding: utf-8 -*-

NOTE: You need infinity embeddings server running to run this script

You can start the server using the following command:

pip install infinity-emb[all]

infinity_emb v2 --model-id… See the full description on the dataset page: https://huggingface.co/datasets/johnsutor/natural_reasoning_embedded.

Split	# of instances	# of unanswerables	avg. question length	avg. paragraph length	avg. answer length
Train	9,000	2,700	8.39	224.58	9.61
Test	938	280	8.02	220.18	5.99

Facebook

Twitter

Click to copy link

Link copied

Cite

Juan Pablo (2019). options-IV-SP500 [Dataset]. https://huggingface.co/datasets/gauss314/options-IV-SP500

options-IV-SP500

gauss314/options-IV-SP500

USA options implied volatility features for machine learning

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 14, 2019

Authors

Juan Pablo

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Downloading the Options IV SP500 Dataset

This document will guide you through the steps to download the Options IV SP500 dataset from Hugging Face Datasets. This dataset includes data on the options of the S&P 500, including implied volatility. To start, you'll need to install Hugging Face's datasets library if you haven't done so already. You can do this using the following pip command: !pip install datasets

Here's the Python code to load the Options IV SP500 dataset from Hugging… See the full description on the dataset page: https://huggingface.co/datasets/gauss314/options-IV-SP500.

Clear search

Close search

Google apps

Main menu

options-IV-SP500

MIP

CF-MS_Homo_sapiens_PPI

PersianQA

PersianQA: a dataset for Persian Question Answering

Dataset

Access and Download

Statistic

Models

Citation

punkt

clone ICE

create conda env

install package

AttentiveSkin

Complete code and datasets for "ESNLIR: Expanding Spanish NLI Benchmarks...

ESNLIR: Expanding Spanish NLI Benchmarks with Multi-Genre and Causal Annotation

How to cite:

IMPORTANT UPDATE!!!

It is strongly advised to work with the following links, instead of working directly from Zenodo:

CODE REPOSITORY: This repository contains the code used for the article.

SMALL EXAMPLE REPOSITORY: This repository contains a small code example showing you how to train, and predict using a very small toy dataset, with the same structure.

HUGGING FACE COLLECTION: Huggingface collection containing the dataset and models.

Installation

Core code

Parameters

Models

Model types

Model folder

Load model

Dataset

labeled_final_dataset.jsonl

Other datasets:

base_dataset

splits_data

sentence_data

Dataset dictionary

Dataset load

Notebooks

VLM4Bio

nayjest/Phi-3-mini-4k-instruct

Model Summary

Intended Uses

How to Use

Tokenizer

Chat Format

Simeonov2008

s_clips-v1.0-safe

multimodalpragmatic

relabeled_coco_val

OpenMLPerf

Untar raw files

Create a virtual environment

Activate the virtual environment

Install the required packages

Run the processing script

jade-cmd

Easy_Galore_8bit_training_With_Native_Windows_Support

ViTCoT

UNPC_EN_ZH

DocStruct4M_ip

natural_reasoning_embedded

/usr/bin/env python3

-*- coding: utf-8 -*-

NOTE: You need infinity embeddings server running to run this script

You can start the server using the following command:

pip install infinity-emb[all]

infinity_emb v2 --model-id… See the full description on the dataset page: https://huggingface.co/datasets/johnsutor/natural_reasoning_embedded.

options-IV-SP500

gauss314/options-IV-SP500

USA options implied volatility features for machine learning

-- coding: utf-8 --