Dataset Card for STL-10
Dataset Details
Dataset Description
The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training.… See the full description on the dataset page: https://huggingface.co/datasets/randall-lab/stl10.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Datasets:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.
Dataset
This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from STL10. All zoos with extensive information and code can be found at www.modelzoos.cc.
This repository contains the raw model zoos as collections of models (file names beginning with "cifar_"). Zoos are trained with small and large CNN models, in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). Due to the large filesize, the preprocessed datasets are hosted in a separate repository. The index_dict.json files contain information on how to read the vectorized models.
For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
The National Oceanic and Atmospheric Administration (NOAA) has the statutory mandate to collect hydrographic data in support of nautical chart compilation for safe navigation and to provide background data for engineers, scientific, and other commercial and industrial activities. Hydrographic survey data primarily consist of water depths, but may also include features (e.g. rocks, wrecks), navigation aids, shoreline identification, and bottom type information. NOAA is responsible for archiving and distributing the source data as described in this metadata record.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about news and is filtered where the news title includes St. Louis, featuring 10 columns including classification, entities, keywords, news link, and news title. The preview is ordered by publication date (descending).
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Ce jeu de données présente le cumul des 10 plus hautes rémunérations à Saint-Louis Agglomération à partir de 2020. Contexte : Dans le but de renforcer la transparence et l’équité dans les hautes rémunérations de la fonction publique, les départements ministériels ainsi que les collectivités territoriales les plus importantes (plus de 80 000 habitants) devront rendre publiques sur leur site internet les dix rémunérations les plus élevées des agents relevant de leur périmètre. Le Gouvernement remettra au parlement un rapport annuel sur ces dix plus hautes rémunérations en précisant également le nombre de femmes et d’hommes figurant parmi ces dix rémunérations les plus élevées. En ce sens, l’article 37 contribue à la transparence de la vie publique tout en permettant de mieux appréhender les différentiels existants avec les rémunérations pratiquées dans le secteur privé pour les postes comparables d’encadrement supérieur et de direction. Ce jeu de données respecte à 100% le schéma de données proposé par Etalab (contenu et structure). Personnes référentes des données : Namik Scherzl - Service SIG - Open Data - 03.89.70.46.67 Eric Zinger - Service des Ressources Humaines - 03.89.70.90.76
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about news and is filtered where the news title includes Saint Louis University, featuring 10 columns including classification, entities, keywords, news link, and news title. The preview is ordered by publication date (descending).
GPX file for the 10K course at GO! St. Louis Marathon Weekend
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Values of difference in structural dimensions in the different groups according to the compression software and online transmission tool used in the scanning models of permanent dentition.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset Card for STL-10
Dataset Details
Dataset Description
The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training.… See the full description on the dataset page: https://huggingface.co/datasets/randall-lab/stl10.