OpenGVLab/VideoMAEv2-TAL-Features dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Downloading the Options IV SP500 Dataset
This document will guide you through the steps to download the Options IV SP500 dataset from Hugging Face Datasets. This dataset includes data on the options of the S&P 500, including implied volatility. To start, you'll need to install Hugging Face's datasets library if you haven't done so already. You can do this using the following pip command: !pip install datasets
Here's the Python code to load the Options IV SP500 dataset from Hugging… See the full description on the dataset page: https://huggingface.co/datasets/gauss314/options-IV-SP500.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a publication of the CoAID dataset originaly dedicated to fake news detection. We changed here the purpose of this dataset in order to use it in the context of event tracking in press documents.
Cui, Limeng, et Dongwon Lee. 2020. « CoAID: COVID-19 Healthcare Misinformation Dataset ». ArXiv:2006.00885 [Cs], novembre. http://arxiv.org/abs/2006.00885.
In this dataset, we provide multiple features extracted from the text itself. Please note the text is missing from the dataset published in the CSV format for copyright reasons. You can download the original datasets and manually add the missing texts from the original publications.
Features are extracted using:
A corpus of reference articles in multiple languages languages for TF-IDF weighting. (features_news) [1]
A corpus of tweets reporting news for TF-IDF weighting. (features_tweets) [1]
A S-BERT model [2] that uses distiluse-base-multilingual-cased-v1 (called features_use) 3
A S-BERT model [2] that uses paraphrase-multilingual-mpnet-base-v2 (called features_mpnet) 4
References:
[1]: Guillaume Bernard. (2022). Resources to compute TF-IDF weightings on press articles and tweets (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6610406
[2]: Reimers, Nils, et Iryna Gurevych. 2019. « Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks ». In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3982‑92. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410.
AnalitycsCrew/features dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By huggan (From Huggingface) [source]
Researchers and developers can leverage this dataset to explore and analyze facial representations depicted in different artistic styles throughout history. These images represent a rich tapestry of human expressions, cultural diversity, and artistic interpretations, providing ample opportunities for leveraging computer vision techniques.
By utilizing this extensive dataset during model training, machine learning practitioners can enhance their algorithms' ability to recognize and interpret facial elements accurately. This is particularly beneficial in applications such as face recognition systems, emotion detection algorithms, portrait analysis tools, or even historical research endeavors focusing on portraiture.
Downloading the Dataset:
Start by downloading the dataset from Kaggle's website. The dataset file is named train.csv, which contains the necessary image data for training your models.
Exploring the Data:
Once you have downloaded and extracted the dataset, it's time to explore its contents. Load the train.csv file into your preferred programming environment or data analysis tool to get an overview of its structure and columns.
Understanding the Columns:
The main column of interest in this dataset is called image. This column contains links or references to specific images in the Metropolitan Museum of Art's collection, showcasing different faces captured within them.
Accessing Images from URLs or References:
To access each image associated with their respective URLs or references, you can write code or use libraries that support web scraping or download functionality. Each row under the image column will provide you with a URL or reference that can be used to fetch and download that particular image.
Preprocessing and Data Augmentation (Optional):
Depending on your use case, you might need to perform various preprocessing techniques on these images before using them as input for your machine learning models. Preprocessing steps may include resizing, cropping, normalization, color space conversions, etc.
Training Machine Learning Models:
Once you have preprocessed any necessary data, it's time to start training your machine learning models using this image dataset as training samples.
Analysis and Evaluation:
After successfully training your model(s), evaluate their performance using validation datasetse if available . You can also make predictions on unseen images, measure accuracy, and analyze the results to gain insights or adjust your models accordingly.
Additional Considerations:
Remember to give appropriate credit to the Metropolitan Museum of Art for providing this image dataset when using it in research papers or other publications. Additionally, be aware of any licensing restrictions or terms of use associated with the images themselves.
- Facial recognition: This dataset can be used to train machine learning models for facial recognition systems. By using the various images of faces from the Metropolitan Museum of Art, the models can learn to identify and differentiate between different individuals based on their facial features.
- Emotion detection: The images in this dataset can be utilized for training models that can detect emotions on human faces. This could be valuable in applications such as market research, where understanding customer emotional responses to products or advertisements is crucial.
- Cultural analysis: With a diverse range of historical faces from different times and regions, this dataset could be employed for cultural analysis and exploration. Machine learning algorithms can identify common visual patterns or differences among different cultures, shedding light on the evolution of human appearances across time and geography
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description ...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
andrey56200702/Features dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13593643%2Fccdb51d9736ccf80043501aada4fce85%2FVhKJwA7Tysql8UyvoQWiM.png?generation=1745449425197291&alt=media" alt="">
IndoorOutdoorNet-20K is a labeled image dataset designed for the task of image classification, particularly focused on distinguishing between indoor and outdoor scenes. The dataset is publicly available on Hugging Face Datasets and is useful for scene understanding, transfer learning, and model benchmarking.
Column | Type | Description |
---|---|---|
image | Image | Input image file |
label | Class | Scene label: Indoor or Outdoor |
Image | Label |
---|---|
Indoor | |
Outdoor |
Note: For full visualization, visit the dataset viewer on Hugging Face.
You can use this dataset directly with the datasets
library:
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/IndoorOutdoorNet-20K")
To visualize a sample:
import matplotlib.pyplot as plt
sample = dataset['train'][0]
plt.imshow(sample['image'])
plt.title(sample['label'])
plt.axis('off')
plt.show()
If you use this dataset in your research or project, please cite it appropriately. (You can include a BibTeX entry here if available.)
This dataset is licensed under the Apache 2.0 License.
Curated & Maintained by @prithivMLmods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FacaCaption-15M
FaceCaption-15M, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image caption dataset to date.
News and Updates 🔥🔥🔥:
**[25/01/01]… See the full description on the dataset page: https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
snad-space/ztf-dr3-m31-features dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
MDPE Dataset
MDPE is a multimodal deception dataset. Besides deception features, it also includes individual differences information in personality and emotional expression characteristics. MDPE not only supports deception detection, but also provides conditions for tasks such as personality recognition and emotion recognition, and can even study the relationships between them. Github Repo
Dataset Download
The data are passcode protected. Please download and send the… See the full description on the dataset page: https://huggingface.co/datasets/MDPEdataset/MDPE_Dataset.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for Camelyon16-features
Dataset Summary
The Camelyon16 dataset is a very popular benchmark dataset used in the field of cancer classification.
The dataset we've uploaded here is the result of features extracted from the Camelyon16 dataset using the Phikon model, which is also openly available on Hugging Face.
Dataset Creation
Initial Data Collection and Normalization
The initial collection of the Camelyon16 Whole Slide Images… See the full description on the dataset page: https://huggingface.co/datasets/owkin/camelyon16-features.
Dataset Card for Dataset Name
Dataset Summary
We designed a larger and more generic Word Embedding over Linguistic Features for Fake News Detection (WELFake) dataset of 72,134 news articles with 35,028 real and 37,106 fake news. For this, we merged four popular news datasets (i.e. Kaggle, McIntire, Reuters, BuzzFeed Political) to prevent over-fitting of classifiers and to provide more text data for better ML training. Dataset contains four columns: Serial number (starting… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/WELFake.
ZZZtong/common-accent-all-features dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
AuroraCap Trainset
Resources
Website arXiv: Paper GitHub: Code Huggingface: AuroraCap Model Huggingface: VDC Benchmark Huggingface: Trainset
Features
We use over 20 million high-quality image/video-text pairs to train AuroraCap in three stages. Pretraining stage. We first align visual features with the word embedding space of LLMs. To achieve this, we freeze the pretrained ViT and LLM, training solely the vision-language connector. Vision stage. We… See the full description on the dataset page: https://huggingface.co/datasets/wchai/AuroraCap-trainset.
Marketing Campaigns Dataset
This repository contains a dataset specifically designed for generating marketing content. The dataset includes various features that are crucial for crafting effective marketing strategies, such as industry, channel, objective, and more. This dataset is ideal for use in machine learning models, AI-powered marketing tools, and data-driven marketing analyses.
Dataset Overview
The dataset consists of multiple entries, each representing a specific… See the full description on the dataset page: https://huggingface.co/datasets/RafaM97/marketing_social_media.
MariamBM/Features dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is the blind eval dataset of high-quality, diverse, human-written instructions with demonstrations. We will be using this for step 3 evaluations in our RLHF pipeline.
Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
OpenSportsLab/SoccerNet-ActionSpotting-Features dataset hosted on Hugging Face and contributed by the HF Datasets community
OpenGVLab/VideoMAEv2-TAL-Features dataset hosted on Hugging Face and contributed by the HF Datasets community