100+ datasets found

h
VideoMAEv2-TAL-Features
huggingface.co
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenGVLab (2025). VideoMAEv2-TAL-Features [Dataset]. https://huggingface.co/datasets/OpenGVLab/VideoMAEv2-TAL-Features
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2025
Dataset authored and provided by
OpenGVLab
Description
OpenGVLab/VideoMAEv2-TAL-Features dataset hosted on Hugging Face and contributed by the HF Datasets community
h
options-IV-SP500
huggingface.co
Updated Oct 14, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Pablo (2019). options-IV-SP500 [Dataset]. https://huggingface.co/datasets/gauss314/options-IV-SP500
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2019
Authors
Juan Pablo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Downloading the Options IV SP500 Dataset

This document will guide you through the steps to download the Options IV SP500 dataset from Hugging Face Datasets. This dataset includes data on the options of the S&P 500, including implied volatility. To start, you'll need to install Hugging Face's datasets library if you haven't done so already. You can do this using the following pip command: !pip install datasets

Here's the Python code to load the Options IV SP500 dataset from Hugging… See the full description on the dataset page: https://huggingface.co/datasets/gauss314/options-IV-SP500.
Z
CoAID dataset with multiple extracted features (both sparse and dense)
data.niaid.nih.gov
zenodo.org
Updated Jun 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Bernard (2022). CoAID dataset with multiple extracted features (both sparse and dense) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6630404
Explore at:
Dataset updated
Jun 10, 2022
Dataset authored and provided by
Guillaume Bernard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a publication of the CoAID dataset originaly dedicated to fake news detection. We changed here the purpose of this dataset in order to use it in the context of event tracking in press documents.

Cui, Limeng, et Dongwon Lee. 2020. « CoAID: COVID-19 Healthcare Misinformation Dataset ». ArXiv:2006.00885 [Cs], novembre. http://arxiv.org/abs/2006.00885.

In this dataset, we provide multiple features extracted from the text itself. Please note the text is missing from the dataset published in the CSV format for copyright reasons. You can download the original datasets and manually add the missing texts from the original publications.

Features are extracted using:

A corpus of reference articles in multiple languages languages for TF-IDF weighting. (features_news) [1]

A corpus of tweets reporting news for TF-IDF weighting. (features_tweets) [1]

A S-BERT model [2] that uses distiluse-base-multilingual-cased-v1 (called features_use) 3

A S-BERT model [2] that uses paraphrase-multilingual-mpnet-base-v2 (called features_mpnet) 4

References:

[1]: Guillaume Bernard. (2022). Resources to compute TF-IDF weightings on press articles and tweets (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6610406

[2]: Reimers, Nils, et Iryna Gurevych. 2019. « Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks ». In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3982‑92. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410.
h
features
huggingface.co
Updated Aug 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
alicorp_datathon (2023). features [Dataset]. https://huggingface.co/datasets/AnalitycsCrew/features
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 7, 2023
Dataset authored and provided by
alicorp_datathon
Description
AnalitycsCrew/features dataset hosted on Hugging Face and contributed by the HF Datasets community
Metfaces Image Dataset
kaggle.com
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Metfaces Image Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/metfaces-image-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Metfaces Image Dataset

Metropolitan Museum of Art Faces Image Dataset

By huggan (From Huggingface) [source]

About this dataset

Researchers and developers can leverage this dataset to explore and analyze facial representations depicted in different artistic styles throughout history. These images represent a rich tapestry of human expressions, cultural diversity, and artistic interpretations, providing ample opportunities for leveraging computer vision techniques.

By utilizing this extensive dataset during model training, machine learning practitioners can enhance their algorithms' ability to recognize and interpret facial elements accurately. This is particularly beneficial in applications such as face recognition systems, emotion detection algorithms, portrait analysis tools, or even historical research endeavors focusing on portraiture.

How to use the dataset

Downloading the Dataset:

Start by downloading the dataset from Kaggle's website. The dataset file is named train.csv, which contains the necessary image data for training your models.

Exploring the Data:

Once you have downloaded and extracted the dataset, it's time to explore its contents. Load the train.csv file into your preferred programming environment or data analysis tool to get an overview of its structure and columns.

Understanding the Columns:

The main column of interest in this dataset is called image. This column contains links or references to specific images in the Metropolitan Museum of Art's collection, showcasing different faces captured within them.

Accessing Images from URLs or References:

To access each image associated with their respective URLs or references, you can write code or use libraries that support web scraping or download functionality. Each row under the image column will provide you with a URL or reference that can be used to fetch and download that particular image.

Preprocessing and Data Augmentation (Optional):

Depending on your use case, you might need to perform various preprocessing techniques on these images before using them as input for your machine learning models. Preprocessing steps may include resizing, cropping, normalization, color space conversions, etc.

Training Machine Learning Models:

Once you have preprocessed any necessary data, it's time to start training your machine learning models using this image dataset as training samples.

Analysis and Evaluation:

After successfully training your model(s), evaluate their performance using validation datasetse if available . You can also make predictions on unseen images, measure accuracy, and analyze the results to gain insights or adjust your models accordingly.

Additional Considerations:

Remember to give appropriate credit to the Metropolitan Museum of Art for providing this image dataset when using it in research papers or other publications. Additionally, be aware of any licensing restrictions or terms of use associated with the images themselves.

Research Ideas

Facial recognition: This dataset can be used to train machine learning models for facial recognition systems. By using the various images of faces from the Metropolitan Museum of Art, the models can learn to identify and differentiate between different individuals based on their facial features.

Emotion detection: The images in this dataset can be utilized for training models that can detect emotions on human faces. This could be valuable in applications such as market research, where understanding customer emotional responses to products or advertisements is crucial.

Cultural analysis: With a diverse range of historical faces from different times and regions, this dataset could be employed for cultural analysis and exploration. Machine learning algorithms can identify common visual patterns or differences among different cultures, shedding light on the evolution of human appearances across time and geography

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description ...
h
Features
huggingface.co
Updated Feb 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey56565656 (2024). Features [Dataset]. https://huggingface.co/datasets/andrey56200702/Features
Explore at:
Dataset updated
Feb 24, 2024
Authors
Andrey56565656
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
andrey56200702/Features dataset hosted on Hugging Face and contributed by the HF Datasets community
h
COCO
huggingface.co
datasets.activeloop.ai
Updated Feb 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HuggingFaceM4 (2023). COCO [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/COCO
Explore at:
Dataset updated
Feb 6, 2023
Dataset authored and provided by
HuggingFaceM4
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
IndoorOutdoorNet-20K
kaggle.com
huggingface.co
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PRITHIV SAKTHI U R (2025). IndoorOutdoorNet-20K [Dataset]. http://doi.org/10.34740/kaggle/dsv/11530480
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11530480
Dataset updated
Apr 23, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PRITHIV SAKTHI U R
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13593643%2Fccdb51d9736ccf80043501aada4fce85%2FVhKJwA7Tysql8UyvoQWiM.png?generation=1745449425197291&alt=media" alt="">

IndoorOutdoorNet-20K

IndoorOutdoorNet-20K is a labeled image dataset designed for the task of image classification, particularly focused on distinguishing between indoor and outdoor scenes. The dataset is publicly available on Hugging Face Datasets and is useful for scene understanding, transfer learning, and model benchmarking.

Dataset Summary

Task: Image Classification

Modalities: Image

Labels: Indoor, Outdoor (2 classes)

Total Images: 19,998

Split: Train (100%)

Languages: English (metadata)

Size: ~451 MB

License: Apache-2.0

Features

Column Type Description
image Image Input image file
label Class Scene label: Indoor or Outdoor

Example

Image Label
Indoor
Outdoor

Note: For full visualization, visit the dataset viewer on Hugging Face.

Usage

You can use this dataset directly with the datasets library:

from datasets import load_dataset dataset = load_dataset("prithivMLmods/IndoorOutdoorNet-20K")

To visualize a sample:

import matplotlib.pyplot as plt sample = dataset['train'][0] plt.imshow(sample['image']) plt.title(sample['label']) plt.axis('off') plt.show()

Applications

Scene classification

Image context recognition

Smart surveillance

Autonomous navigation

Indoor-outdoor transition detection in robotics

Citation

If you use this dataset in your research or project, please cite it appropriately. (You can include a BibTeX entry here if available.)

License

This dataset is licensed under the Apache 2.0 License.

Curated & Maintained by @prithivMLmods.
h
FaceCaption-15M
huggingface.co
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ddw2AIGROUP-CQUPT (2024). FaceCaption-15M [Dataset]. https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2024
Dataset authored and provided by
ddw2AIGROUP-CQUPT
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
FacaCaption-15M

FaceCaption-15M, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image caption dataset to date.

News and Updates 🔥🔥🔥：

**[25/01/01]… See the full description on the dataset page: https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M.
h
ztf-dr3-m31-features
huggingface.co
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SNAD (2024). ztf-dr3-m31-features [Dataset]. https://huggingface.co/datasets/snad-space/ztf-dr3-m31-features
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2024
Dataset authored and provided by
SNAD
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
snad-space/ztf-dr3-m31-features dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MDPE_Dataset
huggingface.co
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MDPE (2024). MDPE_Dataset [Dataset]. https://huggingface.co/datasets/MDPEdataset/MDPE_Dataset
Explore at:
Dataset updated
Aug 3, 2024
Authors
MDPE
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
MDPE Dataset

MDPE is a multimodal deception dataset. Besides deception features, it also includes individual differences information in personality and emotional expression characteristics. MDPE not only supports deception detection, but also provides conditions for tasks such as personality recognition and emotion recognition, and can even study the relationships between them. Github Repo

Dataset Download

The data are passcode protected. Please download and send the… See the full description on the dataset page: https://huggingface.co/datasets/MDPEdataset/MDPE_Dataset.
camelyon16-features
huggingface.co
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Owkin (2023). camelyon16-features [Dataset]. https://huggingface.co/datasets/owkin/camelyon16-features
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2023
Dataset authored and provided by
Owkinhttps://owkin.com/
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for Camelyon16-features

Dataset Summary

The Camelyon16 dataset is a very popular benchmark dataset used in the field of cancer classification.

The dataset we've uploaded here is the result of features extracted from the Camelyon16 dataset using the Phikon model, which is also openly available on Hugging Face.

Dataset Creation Initial Data Collection and Normalization

The initial collection of the Camelyon16 Whole Slide Images… See the full description on the dataset page: https://huggingface.co/datasets/owkin/camelyon16-features.
h
WELFake
huggingface.co
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel van Strien (2025). WELFake [Dataset]. https://huggingface.co/datasets/davanstrien/WELFake
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 30, 2025
Authors
Daniel van Strien
Description
Dataset Card for Dataset Name

Dataset Summary

We designed a larger and more generic Word Embedding over Linguistic Features for Fake News Detection (WELFake) dataset of 72,134 news articles with 35,028 real and 37,106 fake news. For this, we merged four popular news datasets (i.e. Kaggle, McIntire, Reuters, BuzzFeed Political) to prevent over-fitting of classifiers and to provide more text data for better ML training. Dataset contains four columns: Serial number (starting… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/WELFake.
h
common-accent-all-features
huggingface.co
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tong Zhou (2025). common-accent-all-features [Dataset]. https://huggingface.co/datasets/ZZZtong/common-accent-all-features
Explore at:
Dataset updated
Apr 26, 2025
Authors
Tong Zhou
Description
ZZZtong/common-accent-all-features dataset hosted on Hugging Face and contributed by the HF Datasets community
h
AuroraCap-trainset
huggingface.co
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenhao Chai (2023). AuroraCap-trainset [Dataset]. https://huggingface.co/datasets/wchai/AuroraCap-trainset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2023
Authors
Wenhao Chai
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
AuroraCap Trainset

Resources

Website arXiv: Paper GitHub: Code Huggingface: AuroraCap Model Huggingface: VDC Benchmark Huggingface: Trainset

Features

We use over 20 million high-quality image/video-text pairs to train AuroraCap in three stages. Pretraining stage. We first align visual features with the word embedding space of LLMs. To achieve this, we freeze the pretrained ViT and LLM, training solely the vision-language connector. Vision stage. We… See the full description on the dataset page: https://huggingface.co/datasets/wchai/AuroraCap-trainset.
h
marketing_social_media
huggingface.co
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Montanez (2024). marketing_social_media [Dataset]. https://huggingface.co/datasets/RafaM97/marketing_social_media
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Authors
Rafael Montanez
Description
Marketing Campaigns Dataset

This repository contains a dataset specifically designed for generating marketing content. The dataset includes various features that are crucial for crafting effective marketing strategies, such as industry, channel, objective, and more. This dataset is ideal for use in machine learning models, AI-powered marketing tools, and data-driven marketing analyses.

Dataset Overview

The dataset consists of multiple entries, each representing a specific… See the full description on the dataset page: https://huggingface.co/datasets/RafaM97/marketing_social_media.
h
Features
huggingface.co
Updated Dec 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bouzidi (2024). Features [Dataset]. https://huggingface.co/datasets/MariamBM/Features
Explore at:
Dataset updated
Dec 15, 2024
Authors
Bouzidi
Description
MariamBM/Features dataset hosted on Hugging Face and contributed by the HF Datasets community
instruction-dataset
huggingface.co
opendatalab.com
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face H4 (2023). instruction-dataset [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 8, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face H4
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is the blind eval dataset of high-quality, diverse, human-written instructions with demonstrations. We will be using this for step 3 evaluations in our RLHF pipeline.
h
Amazon-Reviews-2023
huggingface.co
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McAuley-Lab (2023). Amazon-Reviews-2023 [Dataset]. https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023
Explore at:
Dataset updated
Sep 15, 2023
Dataset authored and provided by
McAuley-Lab
Description
Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).
h
SoccerNet-ActionSpotting-Features
huggingface.co
Updated Nov 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenSportsLab (2024). SoccerNet-ActionSpotting-Features [Dataset]. https://huggingface.co/datasets/OpenSportsLab/SoccerNet-ActionSpotting-Features
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 5, 2024
Dataset authored and provided by
OpenSportsLab
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
OpenSportsLab/SoccerNet-ActionSpotting-Features dataset hosted on Hugging Face and contributed by the HF Datasets community

Column	Type	Description
image	Image	Input image file
label	Class	Scene label: `Indoor` or `Outdoor`

Image	Label
	Indoor
	Outdoor

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenGVLab (2025). VideoMAEv2-TAL-Features [Dataset]. https://huggingface.co/datasets/OpenGVLab/VideoMAEv2-TAL-Features

VideoMAEv2-TAL-Features

OpenGVLab/VideoMAEv2-TAL-Features

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 20, 2025

Dataset authored and provided by

OpenGVLab

Description

OpenGVLab/VideoMAEv2-TAL-Features dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

VideoMAEv2-TAL-Features

options-IV-SP500

CoAID dataset with multiple extracted features (both sparse and dense)

features

Metfaces Image Dataset

Metfaces Image Dataset

Metropolitan Museum of Art Faces Image Dataset

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Features

COCO

IndoorOutdoorNet-20K

IndoorOutdoorNet-20K

Dataset Summary

Features

Example

Usage

Applications

Citation

License

FaceCaption-15M

ztf-dr3-m31-features

MDPE_Dataset

camelyon16-features

WELFake

common-accent-all-features

AuroraCap-trainset

marketing_social_media

Features

instruction-dataset

Amazon-Reviews-2023

SoccerNet-ActionSpotting-Features

VideoMAEv2-TAL-Features

OpenGVLab/VideoMAEv2-TAL-Features