Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The TraivsTorrent treasure trove data sets from the publicationBeller, Moritz, Georgios Gousios, and Andy Zaidman. "Travistorrent: Synthesizing travis ci and github for full-stack research on continuous integration." 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Due to the cost of developing and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM registries known as “model hubs” support engineers in distributing and reusing deep learning models. PTM packages include pre-trained weights, documentation, model architectures, datasets, and metadata. Mining the information in PTM packages will enable the discovery of engineering phenomena and tools to support software engineers. However, accessing this information is difficult — there are many PTM registries, and both the registries and the individual packages may have rate limiting for accessing the data.
We present an open-source dataset, PTMTorrent, to facilitate the evaluation and understanding of PTM packages. This paper describes the creation, structure, usage, and limitations of the dataset. The dataset includes a snapshot of 5 model hubs and a total of 15,913 PTM packages. These packages are represented in a uniform data schema for cross-hub mining. We describe prior uses of this data and suggest research opportunities for mining using our dataset.
We provide links to the PTM Dataset and PTM Torrent Source Code.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The TraivsTorrent treasure trove data sets from the publicationBeller, Moritz, Georgios Gousios, and Andy Zaidman. "Travistorrent: Synthesizing travis ci and github for full-stack research on continuous integration." 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 2017.