Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
julien-c/random-ai-sheets dataset hosted on Hugging Face and contributed by the HF Datasets community
dvilasuero/sheets-mcp dataset hosted on Hugging Face and contributed by the HF Datasets community
aisheets/Medical_Challenge_Questions dataset hosted on Hugging Face and contributed by the HF Datasets community
Image2Struct - Music Sheet
Paper | Website | Datasets (Webpages, Latex, Music sheets) | Leaderboard | HELM repo | Image2Struct repo License: Apache License Version 2.0, January 2004
Dataset description
Image2struct is a benchmark for evaluating vision-language models in practical tasks of extracting structured information from images. This subdataset focuses on Music sheets. The model is given an image of the expected output with the prompt: Please generate the Lilypond… See the full description on the dataset page: https://huggingface.co/datasets/stanford-crfm/image2struct-musicsheet-v1.
Built with https://huggingface.co/spaces/aisheets/sheets and this config: columns: object_name: modelName: meta-llama/Llama-3.3-70B-Instruct modelProvider: groq userPrompt: Generate the name of a common day to day object prompt: > You are a rigorous text-generation engine. Generate only the requested output format, with no explanations following the user instruction and avoiding repetition of the existing responses at the end of the prompt.
# User… See the full description on the dataset page: https://huggingface.co/datasets/aisheets/Day_to_Day_Objects_isometric_skeumorphic_3d_bnb.
aisheets/Womens_Lives_Across_Centuries dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Dataset Name
Dataset Details
Experimental composition of 76 cartoon art-style video game character spritesheets. Resized to 512x512, mixed variation of animation styles.
Dataset Description
All images editted using Tiled image editting software as most assets are typically downloaded individually and not in sequence. I compiled each animation sequence into one img to display animations frame-by-frame evenly distributed across some common… See the full description on the dataset page: https://huggingface.co/datasets/mgane/2D_Video_Game_Cartoon_Character_Sprite-Sheets.
aisheets/vibench dataset hosted on Hugging Face and contributed by the HF Datasets community
davanstrien/loc-nineteenth-century-song-sheets dataset hosted on Hugging Face and contributed by the HF Datasets community
aisheets/Day_to_Day_Objects_Isometric_Logos dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Crypto Charts
This dataset is a collection of a sample of images from tweets that I scraped using my Discord bot that keeps track of financial influencers on Twitter. The data consists mainly of images that are cryptocurrency charts. This dataset can be used for a wide variety of tasks, such as image classification or feature extraction.
FinTwit Charts Collection
This dataset is part of a larger collection of datasets, scraped from Twitter and labeled by a human (me).… See the full description on the dataset page: https://huggingface.co/datasets/StephanAkkerman/crypto-charts.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Stock Charts
This dataset is a collection of a sample of images from tweets that I scraped using my Discord bot that keeps track of financial influencers on Twitter. The data consists of images that were part of tweets that mentioned a stock. This dataset can be used for a wide variety of tasks, such as image classification or feature extraction.
FinTwit Charts Collection
This dataset is part of a larger collection of datasets, scraped from Twitter and labeled by a… See the full description on the dataset page: https://huggingface.co/datasets/StephanAkkerman/stock-charts.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Image2Struct - Latex
Paper | Website | Datasets (Webpages, Latex, Music sheets) | Leaderboard | HELM repo | Image2Struct repo License: Apache License Version 2.0, January 2004
Dataset description
Image2struct is a benchmark for evaluating vision-language models in practical tasks of extracting structured information from images. This subdataset focuses on LaTeX code. The model is given an image of the expected output with the prompt: Please provide the LaTex code used to… See the full description on the dataset page: https://huggingface.co/datasets/stanford-crfm/image2struct-latex-v1.
https://choosealicense.com/licenses/gpl-2.0/https://choosealicense.com/licenses/gpl-2.0/
Safety Data Sheets Gloves Classification
This dataset contains Safety Data Sheets (SDS) sourced from Kaggle, consisting of over 200,000 documents. SDS are detailed documents providing essential information on the properties and hazards of chemicals, ensuring user safety and compliance with regulatory standards. A subset of these documents was pre-processed, cleaned, and annotated to classify whether protective gloves are required when handling materials. The labels were extracted… See the full description on the dataset page: https://huggingface.co/datasets/BASF-AI/SDS-Gloves-Classification.
Large-scale Multi-modality Models Evaluation Suite
Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval
🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets
This Dataset
This is a formatted version of ChartQA. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @article{masry2022chartqa, title={ChartQA: A benchmark for question answering about charts with visual and… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/ChartQA.
https://choosealicense.com/licenses/etalab-2.0/https://choosealicense.com/licenses/etalab-2.0/
🇫🇷 Service-Public.fr practical sheets Dataset (Administrative Procedures)
This dataset is derived from the official Service-Public.fr platform and contains practical information sheets and resources targeting both individuals (Particuliers) and entrepreneurs (Entreprendre). The purpose of these sheets is to provide information on administrative procedures relating to a number of themes. The data is publicly available on data.gouv.fr and has been processed and chunked for… See the full description on the dataset page: https://huggingface.co/datasets/AgentPublic/service-public.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The datasets contain human evaluation of retrieved chunks from agriculture documents for actual user queries. Each chunk is marked as relevant and irrelevant. The relevant and irrelevant portion of the chunks are mentioned in a separate columns. The dataset consists of multiple XLS files and each XLS file has multiple sheets corresponding to the content for the value chain. The queries are taken from the actual user questions onf farmer.chat prototype bots. For each… See the full description on the dataset page: https://huggingface.co/datasets/CGIAR/RAG-Chunk-Analysis.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
[ Related Paper ] [ Website ] [Models 🤗(Hugging Face)]
ChartX & ChartVLM
Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability… See the full description on the dataset page: https://huggingface.co/datasets/U4R/ChartX.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We introduce WikiMT-X, an enhanced version of WikiMusicText (WikiMT) with audio recordings, richer text annotations, and improved genre labels. Explore it here: WikiMT-X on Hugging Face.
Dataset Summary
In CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval, we introduce WikiMusicText (WikiMT), a new dataset for the evaluation of semantic search and music classification. It includes 1010 lead sheets in ABC notation sourced from… See the full description on the dataset page: https://huggingface.co/datasets/sander-wood/wikimusictext.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
julien-c/random-ai-sheets dataset hosted on Hugging Face and contributed by the HF Datasets community