Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AMiner (aminer.org) aims to provide comprehensive search and mining services for researcher social networks. The system focuses on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network. The main search and analysis functions in AMiner include: profile search, expert finding, conference analysis, course search, sub-graph search, topic browser, academic ranks, and user management.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The data was collected and prepared for Non-commercial research use by Aminer (https://aminer.org). These serve as small downloads of a some datasets for exploration
Facebook
TwitterDataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Contains text pairs from https://www.aminer.org/citation v14. Similairty socres calculated with Jaccard index.
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information… See the full description on the dataset page: https://huggingface.co/datasets/ppxscal/aminer-citation-graphv14-jaccard.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a knowledge graph extracted from a AMiner benchmark for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/aminer.
Facebook
TwitterAMiner (aminer.org) aims to provide comprehensive search and mining services for researcher social networks. In this system, we focus on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset for name disambiguation is composed of 100 author names extracted from aminer database, including 12789 authors and 70258 documents.
Facebook
Twitterhttps://networkrepository.com/policy.phphttps://networkrepository.com/policy.php
Collaboration Networks
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
8 figures of the paper. Figure 1 presents the architecture of AMiner. Figure 2 shows the schema of the researcher profile. Figure 3 is an example of researcher profile. Figure 4 is an overview of the name disambiguation framework in AMiner. Figure 5 is graphical representation of the three Author-Conference-Topic (ACT) models. Figure 6 shows an example result of experts found for “Data Mining”. Figure 7 is a model framework of DeepInf. Figure 8 shows an example of researcher ranking by sociability index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A copy of the dataset used for researcher profile extraction by Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su.This dataset was downloaded from https://aminer.org/lab-datasets/profiling/
Facebook
TwitterTaken from here https://www.aminer.org/citation and converted to csv (but why)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last two fields of each record are the topic id from a taxonomy of 27 topics created from the entire collection and the 20 most significant topic words. Each dataset record (sample) is stored as a JSON line in the text file.
The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released
under ODC-BY license.
This data (OAGT Paper Topic Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).
If using it, please cite the following paper:
Erion Çano, Benjamin Roth: Topic Segmentation of Research Article Collections. ArXiv 2022, CoRR abs/2205.11249, https://doi.org/10.48550/arXiv.2205.11249
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PERSON Dataset V2:Dataset created for paper "Search Personalization Based on Social-Network-Based Interestedness Measures." Please cite the paper for any usage.The dataset is produced by data cleaning of AMiner's citation network V2
dataset (https://aminer.org/citation). Anyone who wants to use PERSON V2 dataset must cite Aminer's dataset (as explained in its homepage: Jie
Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su.
ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2008). pp.990-998) as well as the aforementioned paper.It includes two files: 1- authors_giant.txt: the information of authors and their co-authors. The format is as follows: author ID author name
the list of coauthors delimited by "," (Each entry contains the ID of
the coauthor followed by the number of times they co-authored a paper) ... 2- papers_giant.txt: the information of papers and references. The format is as follows: paper ID Is paper merged (See the first paper for details) original paper ID (in Aminer's dataset) blank blank blank blank title abstract time (only the year part is important) blank references to papers out of the PERSON dataset (indicated by Aminer's IDs) references to papers inside the PERSON dataset (indicated by PERSON's IDs) author IDs ...
Facebook
TwitterThis dataset is a knowledge graph extracted from aAMiner benchmarkfor a research project on knowledge graph embeddings (KGEs)for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (seearXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and anumeric matrixrespectively in the filestextual_literals.npyandnumeric_literals.npy. The fileand_eval.jsoncontains the evaluation dataset used for evaluating our AND architecture. For the script used to gather this dataset see the GitHub repository:https://github.com/sntcristian/and-kge/tree/main/aminer.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Original Dataset: aminer.org and kaggle:Citation Network Dataset
The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations.
DBLP-Citation-network V12: 4,894,081 papers and 45,564,149 citation relationships (2020-04-09)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Салятов Юрий Леонидович
Released under Apache 2.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OAGSX is a title generation dataset consisting of 34408509 abstracts and titles from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file.
The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license.
This data (OAGSX Title Generation Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).
If using it, please consider citing also the following paper:
Çano Erion, Bojar Ondřej. Two Huge Title and Keyword Generation Corpora of Research Articles.
LREC 2020, Proceedings of the the 12th International Conference on Language Resources and Evaluation,
Marseille, France, May 2020.
Facebook
Twitterhttps://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
DBLP (https://dblp.org/) is a comprehensive collection of computer science publications from major and minor journals and conference proceedings. From this dump, we remove arXiv preprints. Our dataset consists of 1.9 million publications from 1970 to 2014 that are authored by 1.1 million authors. We have added citations among publications by combining DBLP with the AMiner dataset (https://www.aminer.org/citation) via publication titles and years. There are 6.6 million citations among publications. Author names in DBLP are disambiguated. To infer the gender of authors, we have used a method that combines the results of name-based and image-based gender detection services. Since the accuracy is very low for Chinese and Korean names, we label their gender as unknown to reduce noise in our analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains similarity scores among articles in AMiner's DBLP v10 dataset.
Similarities are calculated using the JoinSim [1] similarity measure on the derived citation network using the following metapaths:
The file ids.csv contains a mapping from AMiner's ids to our internal numeric ids used in the similarities files.
[1] Xiong, Y., Zhu, Y., Yu, P.S.: Top-k similarity join in heterogeneous information networks. IEEE Transactions on Knowledge and Data Engineering 27(6), 1710– 1723 (2015)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file.
This data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY licence.
This data (OAGK Keyword Generation Dataset) is released under CC-BY licence (https://creativecommons.org/licenses/by/4.0/).
If using it, please cite the following paper:
Çano, Erion and Bojar, Ondřej, 2019, Keyphrase Generation: A Text Summarization Struggle, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, June 2019, Minneapolis, USA
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created by Abdullah D
Released under CC BY-NC-SA 4.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AMiner (aminer.org) aims to provide comprehensive search and mining services for researcher social networks. The system focuses on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network. The main search and analysis functions in AMiner include: profile search, expert finding, conference analysis, course search, sub-graph search, topic browser, academic ranks, and user management.