DS-1000 is a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas. It employs multi-criteria evaluation metrics, including functional correctness and surface-form constraints, resulting in a high-quality dataset with only 1.8% incorrect solutions among accepted Codex-002 predictions. Usage import datasets
queries = datasets.load_dataset("embedding-benchmark/DS1000", "queries") documents =… See the full description on the dataset page: https://huggingface.co/datasets/embedding-benchmark/DS1000.
shanjay/ds1000-s dataset hosted on Hugging Face and contributed by the HF Datasets community
Die nationalen Verkehrsmodelle basieren auf den Strukturdaten 2010. Dieser Datensatz enthält einige Modelle für 2040. Weitere Modelle sind für 2040 sowie für 2010, 2020 und 2030 in den anderen Datensätzen des Projekts verfügbar. Für einen Überblick über den Aufbau des Datensatzes können Sie sich das Dokument ansehen: (Read me first) Projektbeschreibung Verkehrsmodellierung im UVEK D/F/I.
Les modèles nationaux des transports sont effectués sur la base des données structurelles de 2010. Ce jeu de données contient certains modèles pour 2040. D’autres modèles sont disponibles pour 2040, ainsi que pour 2010, 2020 et 2030 dans les autres jeux de données du projet. Pour une vue d'ensemble de la structure des données, vous pouvez consulter le document: (Read me first) Projektbeschreibung Verkehrsmodellierung im UVEK D/F/I.
National transport models are based on structural data from 2010. This dataset contains some models for 2040. Other models are available for 2040, as well as for 2010, 2020 and 2030 in the other datasets of the project. For an overview of the data structure, you can consult the document: (Read me first) Projektbeschreibung Verkehrsmodellierung im UVEK D/F/I.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DS-bench: Code Generation Benchmark for Data Science Code
GitHub repo
Abstract
We introduce DS-bench, a new benchmark designed to evaluate large language models (LLMs) on complicated data science code generation tasks. Existing benchmarks, such as DS-1000, often consist of overly simple code snippets, imprecise problem descriptions, and inadequate testing. DS-bench sources 1,000 realistic problems from GitHub across ten widely used Python data science libraries, offering… See the full description on the dataset page: https://huggingface.co/datasets/LaPluma077/DS_bench.
dnanper/basemodel-qwen2-7B-eval-ds1000 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Objdet Ds is a dataset for object detection tasks - it contains Objects annotations for 1,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
DS PG7 MinneApple is a dataset for object detection tasks - it contains Apple annotations for 1,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan EPI: W: EEP: ECD: DS: Rectifiers data was reported at 1.200 Per 1000 in Dec 2016. This stayed constant from the previous number of 1.200 Per 1000 for Nov 2016. Japan EPI: W: EEP: ECD: DS: Rectifiers data is updated monthly, averaging 1.200 Per 1000 from Jan 1995 (Median) to Dec 2016, with 264 observations. The data reached an all-time high of 1.200 Per 1000 in Dec 2016 and a record low of 1.200 Per 1000 in Dec 2016. Japan EPI: W: EEP: ECD: DS: Rectifiers data remains active status in CEIC and is reported by Bank of Japan. The data is categorized under Global Database’s Japan – Table JP.I160: Export Price Index: 2010=100: Weight.
description: This data set combines vegetation datasets from three mapping project areas in the Sacramento Valley and riparian areas of the San Joaquin Valley to facilitate regional planning, conservation, and enhancement of biological resources by state and local agencies, project partners and regional stakeholders. This dataset meets the National Vegetation Classfication Standared and California Vegetation Classification and Mapping Standards. Vegetation is mapped to the Alliance level with a 1-acre minimum mapping unit. Polygons are also attributed with total bird's-eye cover of trees, shrubs and herbs. Detailed reports on the classification and mapping standards can be downloaded (see summary for links).; abstract: This data set combines vegetation datasets from three mapping project areas in the Sacramento Valley and riparian areas of the San Joaquin Valley to facilitate regional planning, conservation, and enhancement of biological resources by state and local agencies, project partners and regional stakeholders. This dataset meets the National Vegetation Classfication Standared and California Vegetation Classification and Mapping Standards. Vegetation is mapped to the Alliance level with a 1-acre minimum mapping unit. Polygons are also attributed with total bird's-eye cover of trees, shrubs and herbs. Detailed reports on the classification and mapping standards can be downloaded (see summary for links).
This data set combines vegetation datasets from three mapping project areas in the Sacramento Valley and riparian areas of the San Joaquin Valley to facilitate regional planning, conservation, and enhancement of biological resources by state and local agencies, project partners and regional stakeholders. This dataset meets the National Vegetation Classfication Standared and California Vegetation Classification and Mapping Standards. Vegetation is mapped to the Alliance level with a 1-acre minimum mapping unit. Polygons are also attributed with total bird's-eye cover of trees, shrubs and herbs. Detailed reports on the classification and mapping standards can be downloaded (see summary for links).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan EPI: W: EEP: ECD: Discrete Semiconductors (DS) data was reported at 4.400 Per 1000 in Apr 2022. This stayed constant from the previous number of 4.400 Per 1000 for Mar 2022. Japan EPI: W: EEP: ECD: Discrete Semiconductors (DS) data is updated monthly, averaging 4.400 Per 1000 from Jan 2015 (Median) to Apr 2022, with 88 observations. The data reached an all-time high of 4.400 Per 1000 in Apr 2022 and a record low of 4.400 Per 1000 in Apr 2022. Japan EPI: W: EEP: ECD: Discrete Semiconductors (DS) data remains active status in CEIC and is reported by Bank of Japan. The data is categorized under Global Database’s Japan – Table JP.I143: Export Price Index: 2015=100: Weight.
This dataset is Preprocessed⚙️, Compressed🗜️, and Streamable📶!
The goal of this benchmark is to train models which can look at images of food items and detect the individual food items present in them. We use a novel dataset of food images collected through the MyFoodRepo app, where numerous volunteer Swiss users provide images of their daily food intake in the context of a digital cohort called Food & You. This growing data set has been annotated - or automatic annotations have been verified - with respect to segmentation, classification (mapping the individual food items onto an ontology of Swiss Food items), and weight/volume estimation.
Finding annotated food images is difficult. There are some databases with some annotations, but they tend to be limited in important ways. To put it bluntly: most food images on the internet are a lie. Search for any dish, and you’ll find beautiful stock photography of that particular dish. Same on social media: we share photos of dishes with our friends when the image is exceptionally beautiful. But algorithms need to work on real-world images. In addition, annotations are generally missing - ideally, food images would be annotated with proper segmentation, classification, and volume/weight estimates. With this 2022 iteration of the Food Recognition Benchmark, AIcrowd released v2.0 of the MyFoodRepo dataset, containing a training set of 39,962 images food items, with 76,491 annotations.
raw_data/public_training_set_release_2.0.tar.gz: Training Set -> 39,962 (as RGB images) food images -> 76491 annotations -> 498 food classes raw_data/public_validation_set_2.0.tar.gz: Validation Set -> 1000 (as RGB images) food images -> 1830 annotations -> 498 food classes raw_data/public_test_release_2.0.tar.gz: Public Test Set -> Food Recognition Benchmark 2022
Kaggle Notebook - https://www.kaggle.com/sainikhileshreddy/how-to-use-the-dataset
import hub
ds = hub.dataset('/kaggle/input/food-recognition-2022/hub/train/')
import hub
ds = hub.dataset('hub://sainikhileshreddy/food-recognition-2022-train/')
dataloader = ds.pytorch(num_workers = 2, shuffle = True, transform = transform, batch_size= batch_size)
ds_tensorflow = ds.tensorflow()
The benchmark uses the official detection evaluation metrics used by COCO. The primary evaluation metric is AP @ IoU=0.50:0.05:0.95. The seconday evaluation metric is AR @ IoU=0.50:0.05:0.95. A further discussion about the evaluation metric can be found here.
Dataset has been taken from the Food Recognition Benchmark 2022. You can find more details about the challenge on the below link https://www.aicrowd.com/challenges/food-recognition-benchmark-2022
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
nn-auto-bench-ds
nn-auto-bench-ds is a dataset designed for key information extraction (KIE) and serves as a benchmark dataset for nn-auto-bench.
Dataset Overview
The dataset comprises 1,000 documents, categorized into the following types:
Invoice Receipt Passport Bank Statement
The documents are primarily available in English, with some also in German and Arabic. Each document is annotated for key information extraction and specific tasks. The dataset can be used to… See the full description on the dataset page: https://huggingface.co/datasets/nanonets/nn-auto-bench-ds.
BLiMP is a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('blimp', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan PPI: W: ECD: ED: DS: Diodes data was reported at 0.400 Per 1000 in Dec 2016. This stayed constant from the previous number of 0.400 Per 1000 for Nov 2016. Japan PPI: W: ECD: ED: DS: Diodes data is updated monthly, averaging 0.400 Per 1000 from Jan 1980 (Median) to Dec 2016, with 444 observations. The data reached an all-time high of 0.400 Per 1000 in Dec 2016 and a record low of 0.400 Per 1000 in Dec 2016. Japan PPI: W: ECD: ED: DS: Diodes data remains active status in CEIC and is reported by Bank of Japan. The data is categorized under Global Database’s Japan – Table JP.I097: Producer Price Index: 2010=100: Weight.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Changes in the degree of substitution, DS of infusion solution 0,42.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan EPI: W: EEP: ECD: DS: Transistors data was reported at 3.100 Per 1000 in Apr 2022. This stayed constant from the previous number of 3.100 Per 1000 for Mar 2022. Japan EPI: W: EEP: ECD: DS: Transistors data is updated monthly, averaging 3.100 Per 1000 from Jan 1980 (Median) to Apr 2022, with 508 observations. The data reached an all-time high of 3.100 Per 1000 in Apr 2022 and a record low of 3.100 Per 1000 in Apr 2022. Japan EPI: W: EEP: ECD: DS: Transistors data remains active status in CEIC and is reported by Bank of Japan. The data is categorized under Global Database’s Japan – Table JP.I143: Export Price Index: 2015=100: Weight.
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Das Informationssystem gibt einen stark generalisierten Überblick über die Verteilung der Rohstoffvorkommen in Nordrhein-Westfalen. Das Kartenwerk zeigt energetische (Braun- und Steinkohle, Erd- und Grubengas) und nicht-energetische Rohstoffvorkommen (Locker- und Festgesteine, Steinsalz) sowie die Bezirke der Erz- und Industrieminerale in NRW.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan CGPI: W: ED: DE: DS: Transistors data was reported at 1.500 Per 1000 in May 2012. This stayed constant from the previous number of 1.500 Per 1000 for Apr 2012. Japan CGPI: W: ED: DE: DS: Transistors data is updated monthly, averaging 1.500 Per 1000 from Jan 2005 (Median) to May 2012, with 89 observations. The data reached an all-time high of 1.500 Per 1000 in May 2012 and a record low of 1.500 Per 1000 in May 2012. Japan CGPI: W: ED: DE: DS: Transistors data remains active status in CEIC and is reported by Bank of Japan. The data is categorized under Global Database’s Japan – Table JP.I289: Corporate Goods Price Index: 2005=100: Weight.
Information about the dataset: Pieces: 11 pieces(white & black) -> 32 variations -> 1000 photos for each variation 352,000 photos Board: 18 boards -> 5000 photos of empty squares including all variations Information about accessing the data set python from datasets import load_dataset
ds = load_dataset("HardlySalty/chess_piece_and_empty_square_training") print(ds["train"].features["label"].names)
DS-1000 is a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas. It employs multi-criteria evaluation metrics, including functional correctness and surface-form constraints, resulting in a high-quality dataset with only 1.8% incorrect solutions among accepted Codex-002 predictions. Usage import datasets
queries = datasets.load_dataset("embedding-benchmark/DS1000", "queries") documents =… See the full description on the dataset page: https://huggingface.co/datasets/embedding-benchmark/DS1000.