protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene sets are used, motif gene sets and immunological signatures as features and gene ontology sets as labels (121 in total), collected from the Molecular Signatures Database [34]. The average graph contains 2373 nodes, with an average degree of 28.8.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Protein-Protein Interactions (PPI) dataset from the ATOM3D project. This upload includes three zipped data directories:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets contains information about protein-protein interactions (PPI) and protein-protein complex interactions (PCI) in human. It was received by querying the IntAct database based on the criteria that the organism is human and the confidence level of the interaction is based on MI score ≥ 0.45 The confidence level of each interaction is characterised by IntAct MI score. The result was downloaded from IntAct molecular interaction database version 4.2.6 https://www.ebi.ac.uk/intact/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SGD
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A comprehensive selection of data on input and output indices. Contains producer price indices of materials and fuels purchased and output of manufacturing industry by broad sector.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gold Standard Dataset for sequence-based PPI prediction:Big dataset: 163,192 training points (Intra-1), 59,260 validation points (Intra-0), 52,048 test points (Intra-2)) + corresponding protein sequences from SwissprotNo direct data leakage: proteins from training are not contained in validation or test, proteins from validation are not in training or test, proteins from test are not in validation or trainingMinimized sequence similarity between training, validation, test because whole human proteome was split with KaHIP such that sequence similarities are minimized w.r.t. length-normalized bitscoresRedundancy-reduction with CD-HIT: inside of the datasets, no proteins with >40% pairwise sequence similarityNew version: added sequence of Q96PU5 to the human_swissprot_oneliner
Dataset Card for PPI
Summary
The PPI dataset is part of the LUCAONE downstream tasks collection for biomolecular interaction prediction. It is structured for binary classification and includes standard splits for training (train.csv), validation (dev.csv → val), and test (test.csv).
Dataset Structure
This dataset includes three splits:
train val (converted from dev.csv) test
Each split is in CSV format.
Task
Binary classification of interactions… See the full description on the dataset page: https://huggingface.co/datasets/vladak/PPI.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
a database of modulators of protein-protein interactions. It contains exclusively small molecules and therefore no peptides. The data are retrieved from the literature either peer reviewed scientific articles or world patents. A large variety of data is stored within IPPI-DB: structural, pharmacological, binding and activity profile, pharmacokinetic and cytotoxicity when available, as well as some data about the PPI targets themselves.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for PRODUCER PRICE INDEX. reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
PINDER PPI dataset
The PINDER: The Protein INteraction Dataset and Evaluation Resource is a high quality compilation of positive protein protein interactions. Of particular note, the train, valid, and test splits are deduplicated and heavily trimmed based on sequence and structure similarity. For more information on the original dataset compilation, please read their paper, GitHub, or docs.
Differences between this version and the official version
We further processed… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/PINDER.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a subset of the expert curated PPI dataset based on the proteins with an association to Alzheimer’s disease available from IntAct molecular interaction database https://www.ebi.ac.uk/intact/. The confidence level of each interaction is characterised by IntAct MI score.Dataset was downloaded from IntAct database version 4.2.6.
A look at the producer price index for transportation and its components as a measure of inflation faced by consumers.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Producer Price Index by Commodity: All Commodities (PPIACO) from Jan 1913 to May 2025 about commodities, PPI, inflation, price index, indexes, price, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Producer Prices in the United States increased to 148.07 points in May from 147.88 points in April of 2025. This dataset provides the latest reported value for - United States Producer Prices - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for RAPPPID, a method for the Regularised Automative Prediction of Protein-Protein Interactions using Deep Learning.
These datasets are in a format that RAPPPID is ready to read.
Comparatives Dataset
These datasets were derived from the STRING v11 H. sapiens dataset, according to the C1, C2, and C3 procedures outlined by Park and Marcotte, 2012. Negative samples are sampled randomly from the space of proteins not known to interact. See Szymborski & Emad for details.
Repeatability Datasets
The following datasets are all derived from STRING in the manner as the comparatives dataset, but three different random seeds are used for drawing proteins.
References
Park,Y. and Marcotte,E.M. (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods, 9, 1134–1136.
Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., and Mering, C. (2019). String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613.
Szymborski,J. and Emad,A. (2021) RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks. bioRxiv https://doi.org/10.1101/2021.08.13.456309
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the NeurIPS 2025 submission titled "Face to Face with Proteins: Contrastive Surface Learning for Protein–Protein Interaction Prediction."
It contains homology-aware and random train/val/test splits.
All files were generated using the pipeline included in the supplementary code repository.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset for protein-protein interaction prediction across bacteria (Protein sequences)
A dataset of 10,533 bacterial genomes across 6,956 species with protein-protein interaction (PPI) scores for each genome. The genome protein sequences and PPI scores have been extracted from STRING DB. Each row contains a set of protein sequences from a genome, ordered by their location on the chromosome and plasmids and a set of associated PPI scores. The PPI scores have been extracted using the… See the full description on the dataset page: https://huggingface.co/datasets/macwiatrak/bacbench-ppi-stringdb-protein-sequences.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The enclosed archive holds all the datasets used in the INTREPPPID manuscript. See the INTREPPPID documentation for details on the format of the HDF5 files.
Files are organised as follows:
[FORMAT]/seed_[SEED]/[TAXON]/[DATASET_NAME].h5
Where:
FORMAT
is whether the HDF5 is in the RAPPPID or INTREPPPID format. SEED
is the random seed used to generate the dataset. They are all phone numbers found in songs.TAXON
is the NCBI Taxon ID of the organism from which the dataset was generatedDATASET_NAME
is the name of the dataset.
In the manuscript, we use the INTREPPPID format to train them model on Human data, and then test the model using datasets in the RAPPPID format. INTREPPPID can only be trained on datasets with orthology data, but can be tested on datasets without since the orthologous locality loss is only used during training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Producer Price Inflation MoM in the United States increased to 0.10 percent in May from -0.20 percent in April of 2025. This dataset includes a chart with historical data for the United States Producer Price Inflation MoM.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Producer Prices in the United States increased 2.60 percent in May of 2025 over the same month in the previous year. This dataset provides - United States Producer Prices Change - actual values, historical data, forecast, chart, statistics, economic calendar and news.
protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene sets are used, motif gene sets and immunological signatures as features and gene ontology sets as labels (121 in total), collected from the Molecular Signatures Database [34]. The average graph contains 2373 nodes, with an average degree of 28.8.