27 datasets found
  1. S

    AMiner

    • scidb.cn
    Updated Sep 29, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang (2020). AMiner [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00004
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2020
    Dataset provided by
    Science Data Bank
    Authors
    Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AMiner (aminer.org) aims to provide comprehensive search and mining services for researcher social networks. The system focuses on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network. The main search and analysis functions in AMiner include: profile search, expert finding, conference analysis, course search, sub-graph search, topic browser, academic ranks, and user management.

  2. O

    AMiner

    • opendatalab.com
    • paperswithcode.com
    • +1more
    zip
    Updated Sep 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsinghua University (2022). AMiner [Dataset]. https://opendatalab.com/OpenDataLab/AMiner
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    Tsinghua University
    IBM
    Description

    The AMiner Dataset is a collection of different relational datasets. It consists of a set of relational networks such as citation networks, academic social networks or topic-paper-autor networks among others.

  3. AMiner-534K - Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, json, txt
    Updated Oct 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Santini; Cristian Santini (2021). AMiner-534K - Dataset [Dataset]. http://doi.org/10.5281/zenodo.5565220
    Explore at:
    txt, bin, jsonAvailable download formats
    Dataset updated
    Oct 14, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cristian Santini; Cristian Santini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a knowledge graph extracted from a AMiner benchmark for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/aminer.

  4. a

    Open Academic Graph 2019

    • academictorrents.com
    bittorrent
    Updated May 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2021). Open Academic Graph 2019 [Dataset]. https://academictorrents.com/details/4398ab05f1f39d12942f0d5e9ddbdd21beea87a7
    Explore at:
    bittorrent(96053755904)Available download formats
    Dataset updated
    May 20, 2021
    Authors
    None
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    A copy of the "Open Academic Graph v2" (OAGv2) corpus published by aminer.org and Microsoft Academic Graph in early 2019. Contains roughly 90 GB (compressed) of bibliographic metadata for hundreds of millions of publications. Related publications include: Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2008). pp.990-998. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246.

  5. OAGT Paper Topic Dataset

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated May 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erion Çano; Erion Çano (2022). OAGT Paper Topic Dataset [Dataset]. http://doi.org/10.5281/zenodo.6560535
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erion Çano; Erion Çano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last two fields of each record are the topic id from a taxonomy of 27 topics created from the entire collection and the 20 most significant topic words. Each dataset record (sample) is stored as a JSON line in the text file.

    The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released
    under ODC-BY license.

    This data (OAGT Paper Topic Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

    If using it, please cite the following paper:

    Erion Çano, Benjamin Roth: Topic Segmentation of Research Article Collections. ArXiv 2022, CoRR abs/2205.11249, https://doi.org/10.48550/arXiv.2205.11249

  6. h

    DBLP Publications Network

    • data.hellenicdataservice.gr
    Updated Jun 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). DBLP Publications Network [Dataset]. https://data.hellenicdataservice.gr/dataset/1e3f40d4-3b15-4a69-9bf1-cce77a4a6e80
    Explore at:
    Dataset updated
    Jun 20, 2019
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This dataset contains information about academic articles, their authors and venues of publication. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from Aminer (https://aminer.org).

  7. S

    Data from: AMiner: Search and Mining of Academic Social Networks

    • scidb.cn
    Updated Oct 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang (2020). AMiner: Search and Mining of Academic Social Networks [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00021
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2020
    Dataset provided by
    Science Data Bank
    Authors
    Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    8 figures of the paper. Figure 1 presents the architecture of AMiner. Figure 2 shows the schema of the researcher profile. Figure 3 is an example of researcher profile. Figure 4 is an overview of the name disambiguation framework in AMiner. Figure 5 is graphical representation of the three Author-Conference-Topic (ACT) models. Figure 6 shows an example result of experts found for “Data Mining”. Figure 7 is a model framework of DeepInf. Figure 8 shows an example of researcher ranking by sociability index.

  8. AMiner-534K: Knowledge Graph of AMiner benchmark for Author Name...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Gesese. Genet Asefa; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Gesese. Genet Asefa (2021). AMiner-534K: Knowledge Graph of AMiner benchmark for Author Name Disambiguation [Dataset]. http://doi.org/10.5281/zenodo.5675801
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 12, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cristian Santini; Cristian Santini; Mehwish Alam; Mehwish Alam; Gesese. Genet Asefa; Silvio Peroni; Silvio Peroni; Aldo Gangemi; Aldo Gangemi; Harald Sack; Harald Sack; Gesese. Genet Asefa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a knowledge graph extracted from a AMiner benchmark for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. The file and_eval.json contains the evaluation dataset used for evaluating our AND architecture. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/aminer.

  9. PERSON Dataset V2

    • figshare.com
    zip
    Updated Nov 1, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shayan A. Tabrizi; Azadeh Shakery; Mohammad Ali Tavallaei; Masoud Asadpour (2018). PERSON Dataset V2 [Dataset]. http://doi.org/10.6084/m9.figshare.6958514.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 1, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Shayan A. Tabrizi; Azadeh Shakery; Mohammad Ali Tavallaei; Masoud Asadpour
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PERSON Dataset V2:Dataset created for paper "Search Personalization Based on Social-Network-Based Interestedness Measures." Please cite the paper for any usage.The dataset is produced by data cleaning of AMiner's citation network V2 dataset (https://aminer.org/citation). Anyone who wants to use PERSON V2 dataset must cite Aminer's dataset (as explained in its homepage: Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2008). pp.990-998) as well as the aforementioned paper.It includes two files: 1- authors_giant.txt: the information of authors and their co-authors. The format is as follows: author ID author name
    the list of coauthors delimited by "," (Each entry contains the ID of the coauthor followed by the number of times they co-authored a paper) ... 2- papers_giant.txt: the information of papers and references. The format is as follows: paper ID Is paper merged (See the first paper for details) original paper ID (in Aminer's dataset) blank blank blank blank title abstract time (only the year part is important) blank references to papers out of the PERSON dataset (indicated by Aminer's IDs) references to papers inside the PERSON dataset (indicated by PERSON's IDs) author IDs ...

  10. E

    OAGL Paper Length Dataset

    • live.european-language-grid.eu
    binary format
    Updated Jun 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). OAGL Paper Length Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1306
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Jun 29, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OAGL is a paper length prediction dataset consisting of 17528680 records which comprise various scientific publication metadata like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGL Paper Length Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

  11. SeminalSurveyDBLP Dataset for Classification of Seminal and Survey...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christin Katharina Kreutz; Christin Katharina Kreutz; Premtim Sahitaj; Premtim Sahitaj; Ralf Schenkel; Ralf Schenkel (2020). SeminalSurveyDBLP Dataset for Classification of Seminal and Survey Publications [Dataset]. http://doi.org/10.5281/zenodo.3258164
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christin Katharina Kreutz; Christin Katharina Kreutz; Premtim Sahitaj; Premtim Sahitaj; Ralf Schenkel; Ralf Schenkel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains citation network data for 1320 publications from dblp (https://dblp.uni-trier.de/) enriched with data from AMiner (https://aminer.org/) for classification of seminal and survey publications.

    Citations and references are contained for every publication. For each of the 121,084 papers, dblp key, publication year as well as stemmed and unstemmed concatenations of its title and abstract are given. Seminal papers come from A* conferences, surveys were extracted from venues specialized in publishing reviews.

    For details, see Revaluating Semantometrics from Computer Science Publications, Christin Katharina Kreutz, Premtim Sahitaj, and Ralf Schenkel, 2019, submitted to BIRNDL@SIGIR.

  12. c

    Computer Science (1970-2014)

    • datacatalogue.cessda.eu
    • search.gesis.org
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lietz, Haiko (2023). Computer Science (1970-2014) [Dataset]. http://doi.org/10.7802/2642
    Explore at:
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    GESIS - Leibniz-Institut für Sozialwissenschaften
    Authors
    Lietz, Haiko
    Measurement technique
    Computer-based observation
    Description

    DBLP (https://dblp.org/) is a comprehensive collection of computer science publications from major and minor journals and conference proceedings. From this dump, we remove arXiv preprints. Our dataset consists of 1.9 million publications from 1970 to 2014 that are authored by 1.1 million authors. We have added citations among publications by combining DBLP with the AMiner dataset (https://www.aminer.org/citation) via publication titles and years. There are 6.6 million citations among publications. Author names in DBLP are disambiguated. To infer the gender of authors, we have used a method that combines the results of name-based and image-based gender detection services. Since the accuracy is very low for Chinese and Korean names, we label their gender as unknown to reduce noise in our analysis.

  13. P

    https://www.aminer.cn/cosnet Dataset

    • paperswithcode.com
    Updated Nov 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). https://www.aminer.cn/cosnet Dataset [Dataset]. https://paperswithcode.com/dataset/https-www-aminer-cn-cosnet
    Explore at:
    Dataset updated
    Nov 23, 2024
    Description

    Click to add a brief description of the dataset (Markdown and LaTeX enabled).

    Provide:

    a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset

  14. Domain expert readability dataset

    • zenodo.org
    csv
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thanasis Vergoulis; Thanasis Vergoulis; Ilias Kanellos; Ilias Kanellos; Anargiros Tzaferos; Serafeim Chatzopoulos; Theodore Dalamagas; Spiros Skiadopoulos; Anargiros Tzaferos; Serafeim Chatzopoulos; Theodore Dalamagas; Spiros Skiadopoulos (2020). Domain expert readability dataset [Dataset]. http://doi.org/10.5281/zenodo.2651010
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Thanasis Vergoulis; Thanasis Vergoulis; Ilias Kanellos; Ilias Kanellos; Anargiros Tzaferos; Serafeim Chatzopoulos; Theodore Dalamagas; Spiros Skiadopoulos; Anargiros Tzaferos; Serafeim Chatzopoulos; Theodore Dalamagas; Spiros Skiadopoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Judgments gathered from 10 experts through a web-based survey on the readability of publication abstracts. The abstracts used were a subset of the AMiner's DBLP citation nework v10 dataset (https://aminer.org/citation) in the discipline of data and knowledge management. In particular, abstracts containing the following keywords were used: "database", "machine learning", "information retrieval", "data management", "cloud computing", "data mining", "algorithms", "classification", "query processing", "networks", "indexing", "distributed systems".

    After reading the abstract, each expert had to answer the following questions on a 5 point scale.

    • Q1: Please rate how well-written the abstract is.
    • Q2: Does the abstract contain linguistic errors?
    • Q3: Please rate how clear the contribution of the paper is (based on the abstract).

    For each question, the interpretation of the extreme scale values (i.e., 1 and 5) were provided. In particular, 1 = “very poorly written” / “so many ling. errors that make abstract incomprehensible” / “not clear at all” (Q1/Q2/Q3) and 5 = “excellently written” / “no errors” / “completely clear” (Q1/Q2/Q3).

    The pairwise correlations (Kendall’s τ) of expert judgments on questions Q1-Q3 are presented in this table.

    The contained dataset is a tsv file that includes the following fields:

    • user_id: expert identifier
    • paper_id: AMiner's identifier from DBLP citation nework v10 dataset
    • rating_1: answer for Q1
    • rating_2: answer for Q2
    • rating_3: answer fro Q3

    Please cite:
    Thanasis Vergoulis, Ilias Kanellos, Anargiros Tzerefos, Serafeim Chatzopoulos, Theodore Dalamagas, Spiros Skiadopoulos. A study on the readability of scientific publications. 23rd International Conference on Theory and Practice of Digital Libraries. Oslo, Norway 2019 (to appear)

  15. DBLP Article Similarities (DBLP-ArtSim) dataset

    • zenodo.org
    csv
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Serafeim Chatzopoulos; Serafeim Chatzopoulos; Thanasis Vergoulis; Thanasis Vergoulis; Ilias Kanellos; Ilias Kanellos; Theodore Dalamagas; Christos Tryfonopoulos; Theodore Dalamagas; Christos Tryfonopoulos (2021). DBLP Article Similarities (DBLP-ArtSim) dataset [Dataset]. http://doi.org/10.5281/zenodo.3778916
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 27, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Serafeim Chatzopoulos; Serafeim Chatzopoulos; Thanasis Vergoulis; Thanasis Vergoulis; Ilias Kanellos; Ilias Kanellos; Theodore Dalamagas; Christos Tryfonopoulos; Theodore Dalamagas; Christos Tryfonopoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains similarity scores among articles in AMiner's DBLP v10 dataset.

    Similarities are calculated using the JoinSim [1] similarity measure on the derived citation network using the following metapaths:

    • Paper - Author - Paper (PAP_similarities.csv)
    • Paper - Topic - Paper (PTP_similarities.csv)

    The file ids.csv contains a mapping from AMiner's ids to our internal numeric ids used in the similarities files.

    [1] Xiong, Y., Zhu, Y., Yu, P.S.: Top-k similarity join in heterogeneous information networks. IEEE Transactions on Knowledge and Data Engineering 27(6), 1710– 1723 (2015)

  16. E

    OAGKX Keyword Generation Dataset

    • live.european-language-grid.eu
    binary format
    Updated Oct 20, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). OAGKX Keyword Generation Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1281
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Oct 20, 2019
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OAGKX is a keyword extraction/generation dataset consisting of 22674436 abstracts, titles and keyword strings from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file.

    The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license.

    This data (OAGKX Keyword Generation Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

    If using it, please cite the following paper:

    Çano Erion, Bojar Ondřej. Keyphrase Generation: A Multi-Aspect Survey. FRUCT 2019, Proceedings of the 25th Conference of the Open Innovations Association FRUCT, Helsinki, Finland, Nov. 2019

    To reproduce the experiments in the above paper, you can use the first 100000 lines of part_0_0.txt file.

  17. S

    Processing of disambiguation data by Aminer authors

    • scidb.cn
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinzhe; Lu (2024). Processing of disambiguation data by Aminer authors [Dataset]. http://doi.org/10.57760/sciencedb.18075
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Xinzhe; Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data is derived from Aminer's publicly available disambiguation dataset (https://open.aminer.cn/article?id=55af4228dabfae1ce3ed1253), on the basis of which it explores how the characterization of research collaborations affects the novelty and impact of knowledge outcomes

  18. E

    OAGSX Title Generation Dataset

    • live.european-language-grid.eu
    binary format
    Updated Oct 31, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). OAGSX Title Generation Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1286
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Oct 31, 2019
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OAGSX is a title generation dataset consisting of 34408509 abstracts and titles from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file.

    The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license.

    This data (OAGSX Title Generation Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

    If using it, please consider citing also the following paper:

    Çano Erion, Bojar Ondřej. Two Huge Title and Keyword Generation Corpora of Research Articles.

    LREC 2020, Proceedings of the the 12th International Conference on Language Resources and Evaluation,

    Marseille, France, May 2020.

  19. o

    OAGL Paper Metadata Dataset

    • explore.openaire.eu
    Updated Jan 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erion Çano (2020). OAGL Paper Metadata Dataset [Dataset]. https://explore.openaire.eu/search/other?pid=11234%2F1-3257
    Explore at:
    Dataset updated
    Jan 1, 2020
    Authors
    Erion Çano
    Description

    OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGL Paper Metadata Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano Erion, Bojar Ondřej: How Many Pages? Paper Length Prediction from the Metadata. NLPIR 2020, Proceedings of the the 4th International Conference on Natural Language Processing and Information Retrieval, Seoul, Korea, December 2020.

  20. Z

    SUSdblp Dataset for Classification of Seminal, Uninfluential and Survey...

    • data.niaid.nih.gov
    Updated Mar 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ralf Schenkel (2020). SUSdblp Dataset for Classification of Seminal, Uninfluential and Survey Publications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3693938
    Explore at:
    Dataset updated
    Mar 3, 2020
    Dataset provided by
    Ralf Schenkel
    Premtim Sahitaj
    Christin Katharina Kreutz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains citation network data for 1980 publications from dblp (https://dblp.uni-trier.de/) enriched with data from AMiner (https://aminer.org/) for classification of seminal and survey publications. It is an extension of the SeminalSurveyDBLP dataset (https://zenodo.org/record/3258164#.XlztuUoxmUm).

    Citations and references are contained for every publication. For each of the 129,442 papers, dblp key, publication year as well as stemmed and unstemmed concatenations of its title and abstract are given. For citing and referenced papers, their number of citations as well as their field and time normalised citation count are contained. Seminal papers come from A* conferences, surveys were extracted from venues specialized in publishing reviews. Uninfluential publications come from C conferences and obtained less than ten citations.

    For details, see Evaluating Semantometrics from Computer Science Publications, Christin Katharina Kreutz, Premtim Sahitaj, and Ralf Schenkel, to appear in Scientometrics.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang (2020). AMiner [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00004

AMiner

Explore at:
252 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 29, 2020
Dataset provided by
Science Data Bank
Authors
Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

AMiner (aminer.org) aims to provide comprehensive search and mining services for researcher social networks. The system focuses on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network. The main search and analysis functions in AMiner include: profile search, expert finding, conference analysis, course search, sub-graph search, topic browser, academic ranks, and user management.

Search
Clear search
Close search
Google apps
Main menu