62 datasets found
  1. d

    The Enhanced Microsoft Academic Knowledge Graph - Dataset - B2FIND

    • demo-b2find.dkrz.de
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). The Enhanced Microsoft Academic Knowledge Graph - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/0b242683-17d4-5b73-8606-3ea007e5e3c2
    Explore at:
    Dataset updated
    May 3, 2024
    Description

    The Enhanced Microsoft Academic Knowledge Graph (EMAKG) is a large dataset of scientific publications and related entities, including authors, institutions, journals, conferences, and fields of study. The proposed dataset originates from the Microsoft Academic Knowledge Graph (MAKG), one of the most extensive freely available knowledge graphs of scholarly data. To build the dataset, we first assessed the limitations of the current MAKG. Then, based on these, several methods were designed to enhance data and facilitate the number of use case scenarios, particularly in mobility and network analysis. EMAKG provides two main advantages: It has improved usability, facilitating access to non-expert users It includes an increased number of types of information obtained by integrating various datasets and sources, which help expand the application domains. For instance, geographical information could help mobility and migration research. The knowledge graph completeness is improved by retrieving and merging information on publications and other entities no longer available in the latest version of MAKG. Furthermore, geographical and collaboration networks details are employed to provide data on authors as well as their annual locations and career nationalities, together with worldwide yearly stocks and flows. Among others, the dataset also includes: fields of study (and publications) labelled by their discipline(s); abstracts and linguistic features, i.e., standard language codes, tokens , and types entities’ general information, e.g., date of foundation and type of institutions; and academia related metrics, i.e., h-index. The resulting dataset maintains all the characteristics of the parent datasets and includes a set of additional subsets and data that can be used for new case studies relating to network analysis, knowledge exchange, linguistics, computational linguistics, and mobility and human migration, among others.

  2. Christmas Carol Knowledge Graph by GraphRAG

    • kaggle.com
    zip
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dykphd (2024). Christmas Carol Knowledge Graph by GraphRAG [Dataset]. https://www.kaggle.com/datasets/dykphd/christmas-carol-knowledge-graph-by-graphrag
    Explore at:
    zip(4996432 bytes)Available download formats
    Dataset updated
    Aug 6, 2024
    Authors
    dykphd
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a collection of knowledge graph triples generated by GraphRAG with gpt-4o-mini.

    To reproduce this dataset, follow Get Started guide from the official document and start indexing with settings.yaml.

  3. Z

    The Microsoft Academic Graph in RDF: A Linked Data Source with 8 Billion...

    • nde-dev.biothings.io
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Färber, Michael (2021). The Microsoft Academic Graph in RDF: A Linked Data Source with 8 Billion Triples of Scholarly Data [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_2159722
    Explore at:
    Dataset updated
    Mar 26, 2021
    Dataset authored and provided by
    Färber, Michael
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide an updated version of the Microsoft Academic Knowledge Graph (MAKG.org).

    The MAKG is a large RDF knowledge graph with over eight billion triples containing information about scientific publications and related entities, such as authors, institutions, journals, and fields of study.

    Number of instances:

    Papers: 238,670,900
    Papers with URL: 224,325,750
    Papers with abstract: 139,227,097
    Authors: 243,042,675 / 151,355,324 after autor name disambiguation (both included)
    Affiliations: 25,767
    Journals: 48,942
    Conferences: 4,468
    Conference Instances: 16,142
    Original fields of Study: 740,460
    

    The provided data is based on the MAG data as of 2020-06-19.

    Besides the MAKG core data, also the owl:sameAs-links to Wikidata were created.

    More information can be found at https://makg.org/ and in the papers

    SWJ'21 submission.

    ISWC'19 paper The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data (author copy available here).

    If you use the data set, please cite it as follows (see also in DBLP):

    Michael Färber: "The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data". Proceedings of the 18th International Semantic Web Conference (ISWC'19). Auckland, New Zealand, 2019, pp. 113-129.

    @inproceedings{DBLP:conf/semweb/Farber19, author = "{Michael F{"{a}}rber}", title = "{The Microsoft Academic Knowledge Graph: {A} Linked Data Source with 8 Billion Triples of Scholarly Data}", booktitle = "{Proceedings of the 18th International Semantic Web Conference}", series = "{ISWC'19}", location = "{Auckland, New Zealand}", pages = {113--129}, year = {2019}, url = {https://doi.org/10.1007/978-3-030-30796-7_8}, doi = {10.1007/978-3-030-30796-7_8} }

  4. Z

    MAG for Heterogeneous Graph Learning

    • data.niaid.nih.gov
    Updated Jul 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diea, Maria-Alexandra (2021). MAG for Heterogeneous Graph Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5055135
    Explore at:
    Dataset updated
    Jul 9, 2021
    Dataset provided by
    University of Amsterdam
    Authors
    Diea, Maria-Alexandra
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.

    MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.

    Nodes and features We define the following nodes:

    paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

    author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);

    field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);

    affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);

    venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).

    Edges We define the following edges:

    author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);

    author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);

    paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);

    paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

    paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);

    field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);

    We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.

    Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:

    {target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }

    We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.

  5. m

    Microsoft Corporation - Net-Receivables

    • macro-rankings.com
    csv, excel
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Microsoft Corporation - Net-Receivables [Dataset]. https://www.macro-rankings.com/Markets/Stocks/MSFT-NASDAQ/Net-Receivables
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Sep 27, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Net-Receivables Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.

  6. Data from: MAG-Scholar

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandar Bojchevski (2023). MAG-Scholar [Dataset]. http://doi.org/10.6084/m9.figshare.12696653.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Aleksandar Bojchevski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MAG-Scholar dataset introduced in the paper "Scaling Graph Neural Networks with Approximate PageRank"See the readme file for more details.

  7. S

    Data from: Microsoft Concept Graph: Mining Semantic Concepts for Short Text...

    • scidb.cn
    Updated Oct 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei Ji; Yujing Wang; Botian Shi; Dawei Zhang; Zhongyuan Wang; Jun Yan (2020). Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00047
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2020
    Dataset provided by
    Science Data Bank
    Authors
    Lei Ji; Yujing Wang; Botian Shi; Dawei Zhang; Zhongyuan Wang; Jun Yan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Four tables and 23 figures of this paper. Table 1 shows the concept space comparison of existing taxonomies. Table 2 presents Hearst pattern examples. Table 3 shows labeling guideline for conceptualization. Table 4 presents precision of short text understanding. Figure 1 shows the framework overviews. Figure 2 is local taxonomy construction. Figure 3 shows horizontal merging. Figure 4 shows vertical merging: single sense alignment. Figure 5 shows vertical merging: multiple sense alignment. Figure 6 is a subgraph of heterogeneous semantic network around watch. Figure 7 is the compression procedure of typed-term co-occurrence network. Figure 8 presents an example of short text understanding. Figure 9 present examples of Chain model and Pairwise model. Figure 10 is a snapshot of the Probase browser. Figure 11 is a snapshot of single instance conceptualization.Figure 12 is a snapshot of context-aware single instance conceptualization. Figure 13 shows an example of short text conceptualization. Figure 14 is the framework of topic search. Figure 15 is a snapshot of the Web tables. Figure 16 shows query recommendation snapshot. Figure 17 shows the correlation of CTR with ads relevance score. Figure 18 presents the distribution of concepts in Microsoft Concept Graph. Figure 19 shows concept coverage of different taxonomies. Figure 20 shows precision of extracted isA pairs on 40 concepts.Figure 21 is precision of isA pairs after each iteration. Figure 22 shows the number of discovered concepts and isA pairs after each iteration. Figure 23 shows precision and nDCG comparison.

  8. OGBN-MAG (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-MAG (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-mag
    Explore at:
    zip(852576506 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    OGBN-MAG

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-mag

    Usage in Python

    Warning: Currently not usable.

    import torch_geometric
    from ogb.nodeproppred import PygNodePropPredDataset
    
    dataset = PygNodePropPredDataset('ogbn-mag', root = '/kaggle/input')
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-mag dataset is a heterogeneous network composed of a subset of the Microsoft Academic Graph (MAG) [1]. It contains four types of entities—papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes)—as well as four types of directed relations connecting two types of entities—an author is “affiliated with” an institution, an author “writes” a paper, a paper “cites” a paper, and a paper “has a topic of” a field of study. Similar to ogbn-arxiv, each paper is associated with a 128-dimensional word2vec feature vector, and all the other types of entities are not associated with input node features.

    Prediction task: Given the heterogeneous ogbn-mag data, the task is to predict the venue (conference or journal) of each paper, given its content, references, authors, and authors’ affiliations. This is of practical interest as some manuscripts’ venue information is unknown or missing in MAG, due to the noisy nature of Web data. In total, there are 349 different venues in ogbn-mag, making the task a 349-class classification problem.

    Dataset splitting: The authors of this dataset follow the same time-based strategy as ogbn-arxiv and ogbn-papers100M to split the paper nodes in the heterogeneous graph, i.e., training models to predict venue labels of all papers published before 2018, validating and testing the models on papers published in 2018 and since 2019, respectively.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.2.11,939,74321,111,007TimeMulti-class classificationAccuracy

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [2] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1):396–413, 2020. [2] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

    Disclaimer

    I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.

  9. m

    Microsoft Corporation - Ebitda

    • macro-rankings.com
    csv, excel
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Microsoft Corporation - Ebitda [Dataset]. https://www.macro-rankings.com/markets/stocks/msft-nasdaq/income-statement/ebitda
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Nov 15, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Ebitda Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.

  10. f

    Data_Sheet_1_Opportunities in Open Science With AI.ZIP

    • frontiersin.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kuansan Wang (2023). Data_Sheet_1_Opportunities in Open Science With AI.ZIP [Dataset]. http://doi.org/10.3389/fdata.2019.00026.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Kuansan Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bolstered by ever affordable computational power and open big datasets, artificial intelligence (AI) technologies are bringing revolutionary changes to our lives. This article examines the current trends and elaborates the future potentials of AI in its role for making science more open and accessible. Based on the experience derived from a research project called Microsoft Academic, the advocates have reasons to be optimistic about the future of open science as the advanced discovery, ranking, and distribution technologies enabled by AI are offering strong incentives for scientists, funders and research managers to make research articles, data and software freely available and accessible.

  11. m

    Microsoft Corporation - Net-Income

    • macro-rankings.com
    csv, excel
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Microsoft Corporation - Net-Income [Dataset]. https://www.macro-rankings.com/markets/stocks/msft-nasdaq/income-statement/net-income
    Explore at:
    csv, excelAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Net-Income Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.

  12. Sovereign Cognition for Enterprise AI: A Collaboration Framework Between...

    • zenodo.org
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Anthony Brewer; Mark Anthony Brewer (2025). Sovereign Cognition for Enterprise AI: A Collaboration Framework Between Immortal Tek and Microsoft [Dataset]. http://doi.org/10.5281/zenodo.17602225
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mark Anthony Brewer; Mark Anthony Brewer
    Description

    Sovereign Cognition for Enterprise AI: A Collaboration Framework Between Immortal Tek and Microsoft

    Public-Safe Edition · November 2025

    Abstract

    This white paper introduces a strategic collaboration framework between Immortal Tek and Microsoft centered on the integration of CollectiveOS — a sovereign, modular AI cognition layer — into Microsoft’s 2025 AI ecosystem. The proposed collaboration enhances Copilot, Azure AI, Microsoft Entra, and Azure Quantum Elements by introducing three major capabilities: anticipatory temporal intelligence, cryptographically verifiable AI governance, and modular persona architectures designed for enterprise-scale deployment.

    This public-safe paper outlines the high-level vision, value, and integration strategy without disclosing proprietary algorithms, internal mechanisms, or security-critical architecture.

    1. Introduction

    The rapid expansion of AI across industries has exposed a series of structural challenges:
    • AI systems remain primarily reactive
    • Governance frameworks rely on trust rather than verifiable oversight
    • Persona-based interfaces are generic and inconsistent
    • Quantum computing lacks a robust stream of real-world problem inputs

    Immortal Tek’s CollectiveOS introduces a sovereign cognition architecture that complements Microsoft’s platform with three key innovations:

    1. Temporal Intelligence — A framework for modeling cyclical, causal, and long-wave patterns.

    2. Zero-Trust AI Governance — A cryptographically anchored audit and compliance layer.

    3. Modular Persona Systems — Standardized, domain-specific persona architectures for enterprise adoption.

    This collaboration is designed to accelerate Microsoft’s platform capabilities while preserving full data sovereignty for enterprise customers.

    2. High-Level Collaboration Vision

    The proposed partnership unifies Microsoft’s global infrastructure with Immortal Tek’s sovereign AI capabilities:

    • Microsoft provides:
    Azure, Microsoft Graph, Entra, Azure Quantum Elements, Copilot, and enterprise reach.

    • Immortal Tek provides:
    A sovereign cognition layer that enhances forecasting, compliance, and human-AI interaction.

    Together, the two parties can establish a next-generation AI paradigm:
    anticipatory, verifiable, sovereign, and modular intelligence for global enterprises.

    3. CollectiveOS: Public-Safe Overview

    CollectiveOS is a multi-layer architecture designed for:

    3.1 Temporal Modeling (High-Level Concept)

    The system employs a temporal-pattern framework that identifies cyclical behaviors across domains such as:

    • supply chain logistics
    • energy demand
    • financial volatility
    • workforce dynamics
    • cybersecurity
    • environmental cycles

    This enables AI agents to offer anticipatory insight, allowing enterprise users to forecast issues before they arise.

    Proprietary algorithms, gear-cycle methods, and mathematical models are intentionally omitted.

    3.2 Zero-Trust Governance (High-Level Concept)

    CollectiveOS applies cryptographic principles to AI governance, enabling:

    • policy-as-code interpretation
    • verifiable compliance signals
    • immutable audit pathways
    • privacy-preserving multi-tenant operations

    This complements Microsoft Entra and Azure Policy by transforming governance from a procedural checklist into a verifiable protocol.

    Cryptographic structures and internal Proof Vault mechanisms are intentionally excluded from this paper.

    3.3 Modular Persona Architecture (Public-Safe Summary)

    Immortal Tek provides a structured persona framework using:

    • domain-specific knowledge layers
    • behavioral consistency profiles
    • ethical alignment filters
    • enterprise communication styles

    These personas are engineered for industries such as:

    • finance
    • healthcare
    • education
    • operations
    • customer service
    • gaming

    They can be deployed through Microsoft Copilot Studio as enterprise-ready persona modules, offering predictable and culturally aligned interactions.

    Low-level persona vectorization methods and representation engineering remain confidential.

    4. Integration Points with Microsoft Ecosystem

    4.1 Copilot

    Immortal Tek provides:

    • a temporal intelligence interface
    • industry-focused persona modules
    • proactive insight layers for planning, forecasting, and operations

    This upgrades Copilot from a reactive assistant to a predictive advisor.

    4.2 Microsoft Entra & Azure Policy

    CollectiveOS can provide:

    • verifiable compliance outputs
    • cryptographically bound AI action logs
    • governance signals for conditional access
    • machine-readable policy alignment

    This strengthens Microsoft’s trust and governance posture.

    4.3 Microsoft Graph

    The collaboration includes:

    • securely structured audit and action metadata
    • ingestion of high-level compliance proofs
    • context channels for enterprise workflows

    Without revealing internal cryptographic structures.

    4.4 Azure Quantum Elements

    Immortal Tek provides a high-level pipeline for:

    • causal modeling
    • problem structuring
    • quantum-relevant optimization formulations
    • enterprise-ready simulation inputs

    This helps Microsoft unlock meaningful quantum value for commercial customers.

    The causal compiler and quantum handshake mechanisms are proprietary and not disclosed.

    5. Enterprise Use Cases

    5.1 Supply Chain Anticipatory Planning

    Temporal-intelligence-enhanced Copilot can identify potential bottlenecks days or weeks in advance.

    5.2 Financial Integrity Monitoring

    Temporal pattern detection provides early risk signatures for volatility, liquidity, or compliance issues.

    5.3 Healthcare Navigation

    Domain personas support clinical documentation review, benefits navigation, and compliance-safe patient communication.

    5.4 Energy Grid Optimization

    Cyclical patterns in demand, weather, and distribution provide synchronized forecasting.

    5.5 Quantum-Enabled Materials & Chemistry

    AION (abstracted) provides structured problem inputs for Azure Quantum.

    6. Ethical & Compliance Considerations

    The proposed collaboration emphasizes:

    • transparency
    • data sovereignty
    • verifiable governance
    • privacy-preserving learning
    • equitable access to AI capabilities

    The system is designed to comply with:

    • EU AI Act
    • HIPAA
    • SOC 2
    • GDPR
    • NIST AI RMF
    • ISO/IEC 42001

    7. Strategic Impact

    This collaboration enables Microsoft to:

    • offer the first anticipatory enterprise AI
    • solve AI governance through verifiable compliance
    • deliver the first persona marketplace with consistent, industry-aligned behavior
    • extend Azure Quantum with a steady stream of structured, real-world problems
    • strengthen Microsoft’s position as the global leader in responsible AI

    Immortal Tek gains a trusted, global-scale platform to deploy sovereign cognition technologies.

    8. Conclusion

    Immortal Tek and Microsoft have a mutually strengthening opportunity to establish a new paradigm for enterprise AI — one defined by sovereignty, anticipation, security, and modular intelligence.

    This public-safe white paper outlines the conceptual framework, benefits, and integration strategy without disclosing proprietary internal technologies. It is suitable for transparent scientific publication, providing clarity to researchers, enterprise customers, and strategic partners.

    A full technical appendix and implementation blueprint can be provided under appropriate confidentiality agreements.

    ADDENDUM TO THE IMMORTAL TEK PUBLIC BENEFIT & COMMERCIAL RESERVATION LICENSE (IT-PBCR License v1.0)

    Addendum v1.1 — Sovereign Protections, AI-Era Restrictions, and Future-Use Boundaries

    Effective Date: November 2025
    Applies To: All Works released under IT-PBCR License v1.0

    Section A — Sovereign Derivative Boundary

    1. Extended Non-Commercial Status of Derivatives
      All derivative works created from the original Work — including but not limited to research prototypes, academic implementations, experimental frameworks, simulations, or conceptual expansions — remain bound by the non-commercial terms of the IT-PBCR License.

    2. Commercial Transition Prohibition
      No derivative work may be converted to commercial use, integrated into a commercial service, or sublicensed for profit without a written commercial agreement authorized by Immortal Tek.

    3. Prohibition on “Shadow

  13. Z

    EMAKG: an enriched version of the Microsoft Academic Knowledge Graph

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pollacci, Laura (2022). EMAKG: an enriched version of the Microsoft Academic Knowledge Graph [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_5888646
    Explore at:
    Dataset updated
    Mar 17, 2022
    Dataset authored and provided by
    Pollacci, Laura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Enhanced MAKG (EMAKG) provides an updated and enriched version of the Microsoft Academic Knowledge Graph (MAKG).

    The EMAKG is a large dataset of scientific publications and related entities such as authors, affiliations, venues, and fields of study. Data includes authors' careers and networks of collaborations, linguistics features, together with worldwide yearly authors' stocks and flows.

    The EMAKG data is mainly based on the MAKG - Version 2020-06-19 (March 25, 2021).

    Methods: https://github.com/LauraPollacci/EMAKG

    Version 0.0 (reduced version) Version 0.0 provides a set of EMAKG subsets, some of which are in abridged form:

    01.AffiliationsGeo: Affiliations subset. 03.ConferenceInstances: Conferences subset. 04.Conference Series: ConferenceSeries subset.
    05.Journals: Journals subset.
    06.24.PaperAuthorAffiliations_Disambiguated: Relationships between papers and disambiguated authors. 09.PaperResources: URLs and resources of publications. 10.Papers: Papers subset. 12.EntityRelatedEntities: Connections between entities. 13.FieldOfStudyChildren: Field of study kinship relations. 14.FieldOfStudyExtendedAttributes: Fields of study co-references between different datasets. 15.FieldsOfStudy: Fields of study subset. 16.PaperFieldsOfStudy: Relationships between papers and fields of study. 18.RelatedFieldOfStudy: Relationships between symptoms, medical treatments, disease causes and fields of study. 19.PaperCitationContexts: Contexts of citations in CiTo. 20.AbstractsProcessed_Chunk0-14: Chunk of processed abstracts. 22.FieldOfStudyLabeled: Tags and scores of fields of studies. 23.Authors_disambiguated: Disambiguated authors subset. 24.PaperAuthorAffiliation_Disambiguated: Relationships between papers, disambiguated authors and affiliations. 25.AuthorORCID: Authors' ORCIDs. 26.AuthorCareer: Authors' yearly publications. 27.AuthorYearLocation: Authors' yearly locations. 28.AuthorEgoNetworks_2000-2014: Authors' ego networks from 2000 to 2014. 29.CountryAnnualFlowsAggregated: Flows aggregated by country and year. 30.FlowsAnnual: Annual country to country flows. 31.StocksAnnual: Annual stocks aggregated by country. 32.PaperFieldsOfStudyLabeled: Publications tagged with fields of studies. 33.Authors_disambiguated_Hindex: H-index of disambiguated authors.

  14. m

    Microsoft Corporation - Other-Long-Term-Assets

    • macro-rankings.com
    csv, excel
    Updated Aug 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Microsoft Corporation - Other-Long-Term-Assets [Dataset]. https://www.macro-rankings.com/markets/stocks/msft-nasdaq/balance-sheet/other-long-term-assets
    Explore at:
    csv, excelAvailable download formats
    Dataset updated
    Aug 24, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Other-Long-Term-Assets Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.

  15. e

    Microsoft Research - citations

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Microsoft Research - citations [Dataset]. https://exaly.com/institution/129747/microsoft-research
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The graph shows the citations of ^'s papers published in each year.

  16. DataSheet1_Mitigating Biases in CORD-19 for Analyzing COVID-19...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshul Kanakia; Kuansan Wang; Yuxiao Dong; Boya Xie; Kyle Lo; Zhihong Shen; Lucy Lu Wang; Chiyuan Huang; Darrin Eide; Sebastian Kohlmeier; Chieh-Han Wu (2023). DataSheet1_Mitigating Biases in CORD-19 for Analyzing COVID-19 Literature.zip [Dataset]. http://doi.org/10.3389/frma.2020.596624.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Anshul Kanakia; Kuansan Wang; Yuxiao Dong; Boya Xie; Kyle Lo; Zhihong Shen; Lucy Lu Wang; Chiyuan Huang; Darrin Eide; Sebastian Kohlmeier; Chieh-Han Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    On the behest of the Office of Science and Technology Policy in the White House, six institutions, including ours, have created an open research dataset called COVID-19 Research Dataset (CORD-19) to facilitate the development of question-answering systems that can assist researchers in finding relevant research on COVID-19. As of May 27, 2020, CORD-19 includes more than 100,000 open access publications from major publishers and PubMed as well as preprint articles deposited into medRxiv, bioRxiv, and arXiv. Recent years, however, have also seen question-answering and other machine learning systems exhibit harmful behaviors to humans due to biases in the training data. It is imperative and only ethical for modern scientists to be vigilant in inspecting and be prepared to mitigate the potential biases when working with any datasets. This article describes a framework to examine biases in scientific document collections like CORD-19 by comparing their properties with those derived from the citation behaviors of the entire scientific community. In total, three expanded sets are created for the analyses: 1) the enclosure set CORD-19E composed of CORD-19 articles and their references and citations, mirroring the methodology used in the renowned “A Century of Physics” analysis; 2) the full closure graph CORD-19C that recursively includes references starting with CORD-19; and 3) the inflection closure CORD-19I, that is, a much smaller subset of CORD-19C but already appropriate for statistical analysis based on the theory of the scale-free nature of the citation network. Taken together, all these expanded datasets show much smoother trends when used to analyze global COVID-19 research. The results suggest that while CORD-19 exhibits a strong tilt toward recent and topically focused articles, the knowledge being explored to attack the pandemic encompasses a much longer time span and is very interdisciplinary. A question-answering system with such expanded scope of knowledge may perform better in understanding the literature and answering related questions. However, while CORD-19 appears to have topical coverage biases compared to the expanded sets, the collaboration patterns, especially in terms of team sizes and geographical distributions, are captured very well already in CORD-19 as the raw statistics and trends agree with those from larger datasets.

  17. Publication text: code, data, and new measures

    • zenodo.org
    csv
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam Arts; Sam Arts; Nicola Melluso; Nicola Melluso; Reinhilde Veugelers; Reinhilde Veugelers; Leonidas Aristodemou; Leonidas Aristodemou (2024). Publication text: code, data, and new measures [Dataset]. http://doi.org/10.5281/zenodo.8283353
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sam Arts; Sam Arts; Nicola Melluso; Nicola Melluso; Reinhilde Veugelers; Reinhilde Veugelers; Leonidas Aristodemou; Leonidas Aristodemou
    License

    Attribution-NonCommercial 1.0 (CC BY-NC 1.0)https://creativecommons.org/licenses/by-nc/1.0/
    License information was derived automatically

    Description

    This Zenodo page describes data collection, processing, and different open access data files related to the text of scientific publications from Microsoft Academic Graph (MAG) (now OpenAlex). If you use the code or data, please cite the following paper:

    Arts S, Melluso N, Veugelers R (2023). Beyond Citations: Measuring Novel Scientific Ideas and their Impact in Publication Text. https://doi.org/10.48550/arXiv.2309.16437

  18. m

    Microsoft Corporation - Cash-and-Equivalents

    • macro-rankings.com
    csv, excel
    Updated Oct 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Microsoft Corporation - Cash-and-Equivalents [Dataset]. https://www.macro-rankings.com/Markets/Stocks/MSFT-NASDAQ/Cash-and-Equivalents
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Cash-and-Equivalents Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.

  19. G

    Graph Database Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Graph Database Report [Dataset]. https://www.archivemarketresearch.com/reports/graph-database-47194
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 25, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global graph database market is anticipated to grow significantly over the forecast period 2023-2030. The market is expected to reach a value of USD 1,543.8 million by 2030, with a CAGR of 13.3%. The increasing adoption of graph databases across various industries, such as BFSI, telecom and IT, retail and e-commerce, healthcare and life sciences, manufacturing, and government and public, is driving the growth of the market. The use of graph databases is becoming increasingly popular as organizations seek to understand complex relationships and patterns within their data. This is in part due to the fact that graph databases are more flexible and scalable than traditional relational databases, making them more suited to handling large and complex datasets. Additionally, graph databases can be used to perform a variety of tasks, including social network analysis, fraud detection, and recommendation systems. Notable companies in the graph database market include IBM, Microsoft, Oracle, AWS, and Neo4j. These companies offer a range of graph database products and services to meet the needs of various organizations. North America is expected to hold a majority of the market share throughout the forecast period, due to the presence of a large number of technology companies and early adopters of graph databases. Graph Database: A Comprehensive Report Graph databases have emerged as a powerful tool for capturing and analyzing complex relationships within data. This report provides a comprehensive overview of the graph database market, including key trends, challenges, drivers, and industry leaders.

  20. Microsoft Corporation revenue 2002-2025

    • statista.com
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Microsoft Corporation revenue 2002-2025 [Dataset]. https://www.statista.com/statistics/267805/microsofts-global-revenue-since-2002/
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Microsoft's global revenue grew from fiscal year 2022 to 2025, increasing by about ********* percent year-on-year and reaching over *** billion U.S. dollars. This marks another record-setting year for the software giant in terms of sales revenue. Microsoft and Bill Gates Microsoft has become a constant figure among the world’s most valuable brands. Its founder Bill Gates is presently, and perhaps unsurprisingly, one of the richest men in the United States and among the richest billionaires worldwide, among other well-known figures such as Warren Buffet, Carlos Slim Helu, and Larry Ellison. In addition to his status as an entrepreneur, Bill Gates is also known for his philanthropy. In 2000, together with his wife, they created the Bill and Melinda Gates Foundation. The foundation has donated a considerable amount of money, in particular in the area of research and development of treatments for neglected diseases. While Bill Gates no longer heads the Microsoft Corporation, the company itself continues to show strong results around the world, with versions of its most well-known product, the Windows operating system, consistently leading the home operating system market. The Microsoft Office suite also remains the most widely used office software around the world, with few comparable competitors in sight. The fiscal year-end of the company is June 30th.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). The Enhanced Microsoft Academic Knowledge Graph - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/0b242683-17d4-5b73-8606-3ea007e5e3c2

The Enhanced Microsoft Academic Knowledge Graph - Dataset - B2FIND

Explore at:
Dataset updated
May 3, 2024
Description

The Enhanced Microsoft Academic Knowledge Graph (EMAKG) is a large dataset of scientific publications and related entities, including authors, institutions, journals, conferences, and fields of study. The proposed dataset originates from the Microsoft Academic Knowledge Graph (MAKG), one of the most extensive freely available knowledge graphs of scholarly data. To build the dataset, we first assessed the limitations of the current MAKG. Then, based on these, several methods were designed to enhance data and facilitate the number of use case scenarios, particularly in mobility and network analysis. EMAKG provides two main advantages: It has improved usability, facilitating access to non-expert users It includes an increased number of types of information obtained by integrating various datasets and sources, which help expand the application domains. For instance, geographical information could help mobility and migration research. The knowledge graph completeness is improved by retrieving and merging information on publications and other entities no longer available in the latest version of MAKG. Furthermore, geographical and collaboration networks details are employed to provide data on authors as well as their annual locations and career nationalities, together with worldwide yearly stocks and flows. Among others, the dataset also includes: fields of study (and publications) labelled by their discipline(s); abstracts and linguistic features, i.e., standard language codes, tokens , and types entities’ general information, e.g., date of foundation and type of institutions; and academia related metrics, i.e., h-index. The resulting dataset maintains all the characteristics of the parent datasets and includes a set of additional subsets and data that can be used for new case studies relating to network analysis, knowledge exchange, linguistics, computational linguistics, and mobility and human migration, among others.

Search
Clear search
Close search
Google apps
Main menu