Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing templates in the ORKG semantically relevant to the given paper.
Two approaches have been trained on this dataset in the context of this https://doi.org/10.15488/11834 master's thesis, namely a Natural Language Inference (NLI) approach based on SciBERT embeddings and an unsupervised approach based on ElasticSearch.
This publication consists therefore of one general dataset, two training sets for each approach, validation set for the supervised approach and a test set for both approaches.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing templates in the ORKG semantically relevant to the given paper. Two approaches have been trained on this dataset in the context of this https://doi.org/10.15488/11834 master's thesis, namely a Natural Language Inference (NLI) approach based on SciBERT embeddings and an unsupervised approach based on ElasticSearch. This publication consists therefore of one general dataset, two training sets for each approach, validation set for the supervised approach and a test set for both approaches.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing templates in the ORKG semantically relevant to the given paper.
Two approaches have been trained on this dataset in the context of this master's thesis, namely a Natural Language Inference (NLI) approach based on SciBERT embeddings and an unsupervised approach based on ElasticSearch.
This publication consists therefore of one general dataset, two training sets for each approach, validation set for the supervised approach and a test set for both approaches.
dataset.json
The main JSON object consists of a list of templates and a list of neutral papers.
Each template object has an ID, label, list of research fields, list of properties and list of papers using that template, whereas each paper object has ID, label, DOI, research field and abstract.
Each neutral paper object has the same schema of a paper object using that template.
See an example instance below.
{
"templates": [
{
"id": "R138668",
"label": "Psychiatric Disorders AI Overview",
"research_fields": [
{
"id": "http://orkg.org/orkg/resource/R133",
"label": "Artificial Intelligence"
}
...
],
"properties": [
"Study cohort",
...
],
"papers": [
{
"id": "R138698",
"label": "Application of Autoencoder in Depression Diagnosis",
"doi": "10.12783/dtcse/csma2017/17335",
"research_field": {
"id": "R104",
"label": "Bioinformatics"
},
"abstract": "Major depressive disorder (MDD) is a mental disorder characterized by at least two weeks of low mood which is present across most situations. Diagnosis of MDD using rest-state functional magnetic resonance imaging (fMRI) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability. No method can automatically extract discriminative features from the origin time series in fMRI images for MDD diagnosis. In this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge. An autoencoder was used to learn pre-training parameters of a dimensionality reduction process using 3-D convolution network. Through comparison with the other three feature extraction methods, our method achieved the best classification performance. This method can be used not only in MDD diagnosis, but also other similar disorders."
},
...
},
...
]
"neutral_papers": [
{
"id": "R109377",
"label": "Structural basis of SARS-CoV-2 3CLpro and anti-COVID-19 drug discovery from medicinal plants",
"doi": "10.1016/j.jpha.2020.03.009",
"research_field": {
"id": "R104",
"label": "Bioinformatics"
},
"abstract": "Abstract The recent outbreak of coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 in December 2019 raised global health concerns. The viral 3-chymotrypsin-like cysteine protease (3CLpro) enzyme controls coronavirus replication and is essential for its life cycle. 3CLpro is a proven drug discovery target in the case of severe acute respiratory syndrome coronavirus (SARS-CoV) and middle east respiratory syndrome coronavirus (MERS-CoV). Recent studies revealed that the genome sequence of SARS-CoV-2 is very similar to that of SARS-CoV. Therefore, herein, we analysed the 3CLpro sequence, constructed its 3D homology model, and screened it against a medicinal plant library containing 32,297 potential anti-viral phytochemicals/traditional Chinese medicinal compounds. Our analyses revealed that the top nine hits might serve as potential anti- SARS-CoV-2 lead molecules for further optimisation and drug development process to combat COVID-19."
},
...
]
}
All other files
The main JSON object consists of a list of entailments, a list of contradiction and a list of neutrals.
Each object of the above mentioned lists has the same schema. An instance_id created by concatenating the template_id (when exists) with the paper_id, a template_id, a paper_id, premise (representing the paper's title), hypthesis (representing the paper's abstract), their concatenation in sequence and the target class.
See an example instance below.
{
"entailments": [
{
"instance_id": "R138668xR138698",
"template_id": "R138668",
"paper_id": "R138698",
"premise": "psychiatric disorders ai overview study cohort outcome assessment aims performance findings used models data",
"hypothesis": "application of autoencoder in depression diagnosis major depressive disorder (mdd) is a mental disorder characterized by at least two weeks of low mood which is present across most situations diagnosis of mdd using rest state functional magnetic resonance imaging (fmri) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability no method can automatically extract discriminative features from the origin time series in fmri images for mdd diagnosis in this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge an autoencoder was used to learn pre training parameters of a dimensionality reduction process using 3 d convolution network through comparison with the other three feature extraction methods, our method achieved the best classification performance this method can be used not only in mdd diagnosis, but also other similar disorders",
"sequence": "[CLS] psychiatric disorders ai overview study cohort outcome assessment aims performance findings used models data [SEP] application of autoencoder in depression diagnosis major depressive disorder (mdd) is a mental disorder characterized by at least two weeks of low mood which is present across most situations diagnosis of mdd using rest state functional magnetic resonance imaging (fmri) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability no method can automatically extract discriminative features from the origin time series in fmri images for mdd diagnosis in this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge an autoencoder was used to learn pre training parameters of a dimensionality reduction process using 3 d convolution network through comparison with the other three feature extraction methods, our method achieved the best classification performance this method can be used not only in mdd diagnosis, but also other similar disorders [SEP]",
"target": "entailment"
},
...
],
"contradictions": [ ... ],
"neutrals": [ ... ]
}
Statistics
- | Training (supervised) | Validation (supervised) | Training (unsupervised) | Test |
Entailment | 180 | 20 | 200 | 52 |
Neutral | 180 | 20 | 200 | 64 |
Contradictrion | 736 | 84 | 0 | 0 |
Total | 1096 | 124 | 400 | 116 |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing templates in the ORKG semantically relevant to the given paper.
Two approaches have been trained on this dataset in the context of this https://doi.org/10.15488/11834 master's thesis, namely a Natural Language Inference (NLI) approach based on SciBERT embeddings and an unsupervised approach based on ElasticSearch.
This publication consists therefore of one general dataset, two training sets for each approach, validation set for the supervised approach and a test set for both approaches.