22 datasets found

P
https://www.aminer.cn/cosnet Dataset
paperswithcode.com
Updated Nov 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). https://www.aminer.cn/cosnet Dataset [Dataset]. https://paperswithcode.com/dataset/https-www-aminer-cn-cosnet
Explore at:
Dataset updated
Nov 23, 2024
Description
Click to add a brief description of the dataset (Markdown and LaTeX enabled).

Provide:

a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset
O
AMiner
opendatalab.com
paperswithcode.com
+1more
zip
Updated Sep 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tsinghua University (2022). AMiner [Dataset]. https://opendatalab.com/OpenDataLab/AMiner
Explore at:
zipAvailable download formats
Dataset updated
Sep 22, 2022
Dataset provided by
Tsinghua University
IBM
Description
The AMiner Dataset is a collection of different relational datasets. It consists of a set of relational networks such as citation networks, academic social networks or topic-paper-autor networks among others.
AMiner-534K: Knowledge Graph of AMiner benchmark for Author Name...
zenodo.org
zip
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Gesese. Genet Asefa; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Gesese. Genet Asefa (2021). AMiner-534K: Knowledge Graph of AMiner benchmark for Author Name Disambiguation [Dataset]. http://doi.org/10.5281/zenodo.5675801
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5675801
Dataset updated
Nov 12, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Gesese. Genet Asefa; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Gesese. Genet Asefa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a knowledge graph extracted from a AMiner benchmark for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. The file and_eval.json contains the evaluation dataset used for evaluating our AND architecture. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/aminer.
AMiner-534K - Dataset
zenodo.org
data.niaid.nih.gov
bin, json, txt
Updated Oct 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Santini; Cristian Santini (2021). AMiner-534K - Dataset [Dataset]. http://doi.org/10.5281/zenodo.5565220
Explore at:
txt, bin, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5565220
Dataset updated
Oct 14, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cristian Santini; Cristian Santini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a knowledge graph extracted from a AMiner benchmark for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/aminer.
S
AMiner
scidb.cn
Updated Sep 29, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang (2020). AMiner [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00004
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.j00104.00004
Dataset updated
Sep 29, 2020
Dataset provided by
Science Data Bank
Authors
Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AMiner (aminer.org) aims to provide comprehensive search and mining services for researcher social networks. The system focuses on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network. The main search and analysis functions in AMiner include: profile search, expert finding, conference analysis, course search, sub-graph search, topic browser, academic ranks, and user management.
S
Processing of disambiguation data by Aminer authors
scidb.cn
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xinzhe; Lu (2024). Processing of disambiguation data by Aminer authors [Dataset]. http://doi.org/10.57760/sciencedb.18075
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.18075
Dataset updated
Apr 17, 2024
Dataset provided by
Science Data Bank
Authors
Xinzhe; Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data is derived from Aminer's publicly available disambiguation dataset (https://open.aminer.cn/article?id=55af4228dabfae1ce3ed1253), on the basis of which it explores how the characterization of research collaborations affects the novelty and impact of knowledge outcomes
S
Data from: AMiner: Search and Mining of Academic Social Networks
scidb.cn
Updated Oct 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang (2020). AMiner: Search and Mining of Academic Social Networks [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00021
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.j00104.00021
Dataset updated
Oct 15, 2020
Dataset provided by
Science Data Bank
Authors
Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
8 figures of the paper. Figure 1 presents the architecture of AMiner. Figure 2 shows the schema of the researcher profile. Figure 3 is an example of researcher profile. Figure 4 is an overview of the name disambiguation framework in AMiner. Figure 5 is graphical representation of the three Author-Conference-Topic (ACT) models. Figure 6 shows an example result of experts found for “Data Mining”. Figure 7 is a model framework of DeepInf. Figure 8 shows an example of researcher ranking by sociability index.
Dataset for paper: "Knowledge Graph Embeddings based Approach for Author...
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Apr 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Genet Asefa Gesese; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Genet Asefa Gesese (2022). Dataset for paper: "Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals" [Dataset]. http://doi.org/10.5281/zenodo.6309855
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6309855
Dataset updated
Apr 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Genet Asefa Gesese; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Genet Asefa Gesese
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists in two distinct scholarly knowledge graph created from two publicly available bibliographic datasets: 1) a triplestore covering information about the journal Scientometrics provided by OpenCitations (available here), and 2) the AMiner AND benchmark from 2018 available here. This KG was extracted for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graphs are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy in order to simplify the representation learning task. The file and_eval.json of each KG contains the evaluation dataset used for evaluating our AND architecture. For the script used to gather this dataset see https://github.com/sntcristian/and-kge/tree/main/src/AMiner-534K and https://github.com/sntcristian/and-kge/tree/main/src/OC-782K.
OC-782K: Knowledge Graph of "Scientometrics" modelled according to the...
zenodo.org
explore.openaire.eu
+1more
zip
Updated Apr 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Genet Asefa Gesese; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Genet Asefa Gesese (2022). OC-782K: Knowledge Graph of "Scientometrics" modelled according to the OpenCitations Data Model [Dataset]. http://doi.org/10.5281/zenodo.5675787
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5675787
Dataset updated
Apr 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Genet Asefa Gesese; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Genet Asefa Gesese
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a knowledge graph extracted from a triplestore covering information about the journal Scientometrics and modelled according to the OpenCitations Data Model. The original triplestore is available here. This KG was extracted for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. The file and_eval.json contains the evaluation dataset used for evaluating our AND architecture. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/open-citations.
h
DBLP Publications Network
data.hellenicdataservice.gr
Updated Jun 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). DBLP Publications Network [Dataset]. https://data.hellenicdataservice.gr/dataset/1e3f40d4-3b15-4a69-9bf1-cce77a4a6e80
Explore at:
Dataset updated
Jun 20, 2019
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This dataset contains information about academic articles, their authors and venues of publication. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from Aminer (https://aminer.org).
a
Open Academic Graph 2019
academictorrents.com
bittorrent
Updated May 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2021). Open Academic Graph 2019 [Dataset]. https://academictorrents.com/details/4398ab05f1f39d12942f0d5e9ddbdd21beea87a7
Explore at:
bittorrent(96053755904)Available download formats
Dataset updated
May 20, 2021
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
A copy of the "Open Academic Graph v2" (OAGv2) corpus published by aminer.org and Microsoft Academic Graph in early 2019. Contains roughly 90 GB (compressed) of bibliographic metadata for hundreds of millions of publications. Related publications include: Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2008). pp.990-998. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246.
OAGT Paper Topic Dataset
zenodo.org
explore.openaire.eu
+1more
zip
Updated May 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erion Çano; Erion Çano (2022). OAGT Paper Topic Dataset [Dataset]. http://doi.org/10.5281/zenodo.6560535
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6560535
Dataset updated
May 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Erion Çano; Erion Çano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last two fields of each record are the topic id from a taxonomy of 27 topics created from the entire collection and the 20 most significant topic words. Each dataset record (sample) is stored as a JSON line in the text file.

The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released
under ODC-BY license.

This data (OAGT Paper Topic Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

If using it, please cite the following paper:

Erion Çano, Benjamin Roth: Topic Segmentation of Research Article Collections. ArXiv 2022, CoRR abs/2205.11249, https://doi.org/10.48550/arXiv.2205.11249
f
Comparison of this paper’s method with ten baseline methods on the Aminer...
plos.figshare.com
xls
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ge Wang; Zikai Sun; Weiyang HU; MengHuan Cai (2025). Comparison of this paper’s method with ten baseline methods on the Aminer dataset for three types of metrics. [Dataset]. http://doi.org/10.1371/journal.pone.0310992.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310992.t002
Dataset updated
Feb 26, 2025
Dataset provided by
PLOS ONE
Authors
Ge Wang; Zikai Sun; Weiyang HU; MengHuan Cai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of this paper’s method with ten baseline methods on the Aminer dataset for three types of metrics.
DBLP-Citation-network V13
kaggle.com
Updated Nov 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Mineev (2022). DBLP-Citation-network V13 [Dataset]. https://www.kaggle.com/datasets/nikitamineev/dblpcitationnetwork-v13
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 9, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nikita Mineev
Description
Taken from here https://www.aminer.org/citation and converted to csv (but why)
c
Computer Science (1970-2014)
datacatalogue.cessda.eu
search.gesis.org
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lietz, Haiko (2023). Computer Science (1970-2014) [Dataset]. http://doi.org/10.7802/2642
Explore at:
Unique identifier
https://doi.org/10.7802/2642
Dataset updated
Dec 14, 2023
Dataset provided by
GESIS - Leibniz-Institut für Sozialwissenschaften
Authors
Lietz, Haiko
Measurement technique
Computer-based observation
Description
DBLP (https://dblp.org/) is a comprehensive collection of computer science publications from major and minor journals and conference proceedings. From this dump, we remove arXiv preprints. Our dataset consists of 1.9 million publications from 1970 to 2014 that are authored by 1.1 million authors. We have added citations among publications by combining DBLP with the AMiner dataset (https://www.aminer.org/citation) via publication titles and years. There are 6.6 million citations among publications. Author names in DBLP are disambiguated. To infer the gender of authors, we have used a method that combines the results of name-based and image-based gender detection services. Since the accuracy is very low for Chinese and Korean names, we label their gender as unknown to reduce noise in our analysis.
E
OAGL Paper Length Dataset
live.european-language-grid.eu
binary format
Updated Jun 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). OAGL Paper Length Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1306
Explore at:
binary formatAvailable download formats
Dataset updated
Jun 29, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OAGL is a paper length prediction dataset consisting of 17528680 records which comprise various scientific publication metadata like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGL Paper Length Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).
f
Comparison of ablation experiments.
plos.figshare.com
xls
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ge Wang; Zikai Sun; Weiyang HU; MengHuan Cai (2025). Comparison of ablation experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0310992.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310992.t004
Dataset updated
Feb 26, 2025
Dataset provided by
PLOS ONE
Authors
Ge Wang; Zikai Sun; Weiyang HU; MengHuan Cai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the dramatic increase in the number of published papers and the continuous progress of deep learning technology, the research on name disambiguation is at a historic peak, the number of paper authors is increasing every year, and the situation of authors with the same name is intensifying, therefore, it is a great challenge to accurately assign the newly published papers to their respective authors. The current mainstream methods for author disambiguation are mainly divided into two methods: feature-based clustering and connection-based clustering, but none of the current mainstream methods can efficiently deal with the author name disambiguation problem, For this reason, this paper proposes the author name ablation method based on the relational graph heterogeneous attention neural network, first extract the semantic and relational information of the paper, use the constructed graph convolutional embedding module to train the splicing to get a better feature representation, and input the constructed network to get the vector representation. As the existing graph heterogeneous neural network can not learn different types of nodes and edge interaction, add multiple attention, design ablation experiments to verify its impact on the network. Finally improve the traditional hierarchical clustering method, combined with the graph relationship and topology, using training vectors instead of distance calculation, can automatically determine the optimal k-value, improve the accuracy and efficiency of clustering. The experimental results show that the average F1 value of this paper’s method on the Aminer dataset is 0.834, which is higher than other mainstream methods.
PERSON Dataset V2
figshare.com
zip
Updated Nov 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shayan A. Tabrizi; Azadeh Shakery; Mohammad Ali Tavallaei; Masoud Asadpour (2018). PERSON Dataset V2 [Dataset]. http://doi.org/10.6084/m9.figshare.6958514.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6958514.v1
Dataset updated
Nov 1, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Shayan A. Tabrizi; Azadeh Shakery; Mohammad Ali Tavallaei; Masoud Asadpour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PERSON Dataset V2:Dataset created for paper "Search Personalization Based on Social-Network-Based Interestedness Measures." Please cite the paper for any usage.The dataset is produced by data cleaning of AMiner's citation network V2 dataset (https://aminer.org/citation). Anyone who wants to use PERSON V2 dataset must cite Aminer's dataset (as explained in its homepage: Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2008). pp.990-998) as well as the aforementioned paper.It includes two files: 1- authors_giant.txt: the information of authors and their co-authors. The format is as follows: author ID author name
the list of coauthors delimited by "," (Each entry contains the ID of the coauthor followed by the number of times they co-authored a paper) ... 2- papers_giant.txt: the information of papers and references. The format is as follows: paper ID Is paper merged (See the first paper for details) original paper ID (in Aminer's dataset) blank blank blank blank title abstract time (only the year part is important) blank references to papers out of the PERSON dataset (indicated by Aminer's IDs) references to papers inside the PERSON dataset (indicated by PERSON's IDs) author IDs ...
l
OAGL Paper Metadata Dataset
lindat.cz
explore.openaire.eu
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erion Çano (2020). OAGL Paper Metadata Dataset [Dataset]. https://lindat.cz/repository/xmlui/handle/11234/1-3257
Explore at:
Dataset updated
Jun 30, 2020
Authors
Erion Çano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGL Paper Metadata Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper:

Çano Erion, Bojar Ondřej: How Many Pages? Paper Length Prediction from the Metadata. NLPIR 2020, Proceedings of the the 4th International Conference on Natural Language Processing and Information Retrieval, Seoul, Korea, December 2020.
SeminalSurveyDBLP Dataset for Classification of Seminal and Survey...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christin Katharina Kreutz; Christin Katharina Kreutz; Premtim Sahitaj; Premtim Sahitaj; Ralf Schenkel; Ralf Schenkel (2020). SeminalSurveyDBLP Dataset for Classification of Seminal and Survey Publications [Dataset]. http://doi.org/10.5281/zenodo.3258164
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3258164
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christin Katharina Kreutz; Christin Katharina Kreutz; Premtim Sahitaj; Premtim Sahitaj; Ralf Schenkel; Ralf Schenkel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains citation network data for 1320 publications from dblp (https://dblp.uni-trier.de/) enriched with data from AMiner (https://aminer.org/) for classification of seminal and survey publications.

Citations and references are contained for every publication. For each of the 121,084 papers, dblp key, publication year as well as stemmed and unstemmed concatenations of its title and abstract are given. Seminal papers come from A* conferences, surveys were extracted from venues specialized in publishing reviews.

For details, see Revaluating Semantometrics from Computer Science Publications, Christin Katharina Kreutz, Premtim Sahitaj, and Ralf Schenkel, 2019, submitted to BIRNDL@SIGIR.