CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.
Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.
Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.
Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:
Background and Motivation
In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.
While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.
In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.
However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.
Source Code and Tutorial:
https://github.com/llcresearch/CompanyKG2
Paper: to be published
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This horizontal bar chart displays companies by company using the aggregation count in Dearborn. The data is about companies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This horizontal bar chart displays companies by company type using the aggregation count in Athens. The data is about companies.
https://www.thebusinessresearchcompany.com/privacy-policyhttps://www.thebusinessresearchcompany.com/privacy-policy
Global Graph Technology market size is expected to reach $14.21 billion by 2029 at 22%, segmented as by software, graph database software, graph analytics software, graph visualization software, graph query language software
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Business Applications from Corporations: Finance and Insurance in the United States (BACBANAICS52SAUS) from Jul 2004 to Jun 2025 about business applications, finance companies, companies, finance, insurance, financial, business, and USA.
With a market capitalization of 3.12 trillion U.S. dollars as of May 2024, Microsoft was the world’s largest company that year. Rounding out the top five were some of the world’s most recognizable brands: Apple, NVIDIA, Google’s parent company Alphabet, and Amazon. Saudi Aramco led the ranking of the world's most profitable companies in 2023, with a pre-tax income of nearly 250 billion U.S. dollars. How are market value and market capitalization determined? Market value and market capitalization are two terms frequently used – and confused - when discussing the profitability and viability of companies. Strictly speaking, market capitalization (or market cap) is the worth of a company based on the total value of all their shares; an important metric when determining the comparative value of companies for trading opportunities. Accordingly, many stock exchanges such as the New York or London Stock Exchange release market capitalization data on their listed companies. On the other hand, market value technically refers to what a company is worth in a much broader context. It is determined by multiple factors, including profitability, corporate debt, and the market environment as a whole. In this sense it aims to estimate the overall value of a company, with share price only being one element. Market value is therefore useful for determining whether a company’s shares are over- or undervalued, and in arriving at a price if the company is to be sold. Such valuations are generally made on a case-by-case basis though, and not regularly reported. For this reason, market capitalization is often reported as market value. What are the top companies in the world? The answer to this question depends on the metric used. Although the largest company by market capitalization, Microsoft's global revenue did not manage to crack the top 20 companies. Rather, American multinational retailer Walmart was ranked as the largest company in the world by revenue. Walmart also had the highest number of employees in the world.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Business Applications from Corporations: Management of Companies in the United States (BACBANAICS55NSAUS) from Jul 2004 to Jun 2025 about management, business applications, companies, corporate, business, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This horizontal bar chart displays companies by company type using the aggregation count in Burbank. The data is about companies.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Number of Business Failures, Manufacturing Companies for United States (M0930BUSM474NNBR) from Jun 1934 to Dec 1939 about failures, companies, business, manufacturing, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This bar chart displays companies by company using the aggregation count. The data is filtered where the industry is Diversified Consumer Services. The data is about companies.
This private company dataset provides an in-depth view of any specific company’s truck-based supply chain and its relationships with other facilities and companies within the continental US.
Also, using robust supply chain data you will be able to map US facilities (including factories, warehouses, and retail outlets).
With this private company dataset, it is possible to track the movement of trucks and devices between locations to identify supply chain connections and company data insights.
Our Machine learning algorithms ingest 7-15bn daily events to estimate the volume of goods transported between locations. Consequently, we can map supply chain connections between:
•Different companies (expressed as a percentage of volume transported).
•Locations owned by the same company (e.g. warehouse to shop).
With this novel geolocation approach, it is possible to "draw" a knowledge graph of any private or public company´s relations with other companies within the country.
This solution, in the form of a dataset, provides an in-depth view of any specific company’s truck-based supply chain and its relationships with other facilities and companies within the continental United States.
Use cases:
Identification and understanding of relations company-to-company: It helps to identify and infer relationships and connections between specific companies or facilities and between sectors/industries.
Identification and understanding of relations place-to-place: A logistics and domestic distribution supply chain can be mapped, both nationwide and state-wide in the US, and across countries in Europe.
Visualization and mapping of an entire supply chain network.
Tracking of products in any distribution or supply chain.
Risk assessment
Correlation analysis.
Disruption analysis.
Analysis of illicit networks and tracking of illegal use of corporate assets.
Improvement of casualty risk management.
Optimization of supply chain risk management.
Security and compliance.
Identification of not only the first tier of suppliers in the value chain, but also 2nd and 3rd tier suppliers, and more.
Current largest use case: global corporation using it to model risk at a facility level (+100,000 locations).
Why should you trust PREDIK Data-Driven? In 2023, we were listed as Datarade's top providers. Why? Our solutions for private company data, supply chain data, and B2B data adapt according to the specific needs of companies. Also, PREDIK methodology focuses on the client and the necessary elements for the success of their projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Value Added by Industry: Professional and Business Services: Management of Companies and Enterprises was 566.90000 Bil. of $ in January of 2025, according to the United States Federal Reserve. Historically, United States - Value Added by Industry: Professional and Business Services: Management of Companies and Enterprises reached a record high of 566.90000 in January of 2025 and a record low of 209.00000 in January of 2005. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Value Added by Industry: Professional and Business Services: Management of Companies and Enterprises - last updated from the United States Federal Reserve on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All Employees: Professional and Business Services: Management of Companies and Enterprises in California was 286.40000 Thous. of Persons in June of 2025, according to the United States Federal Reserve. Historically, All Employees: Professional and Business Services: Management of Companies and Enterprises in California reached a record high of 311.80000 in April of 1999 and a record low of 187.20000 in December of 1990. Trading Economics provides the current actual value, an historical data chart and related indicators for All Employees: Professional and Business Services: Management of Companies and Enterprises in California - last updated from the United States Federal Reserve on July of 2025.
Graph Database Market Size 2025-2029
The graph database market size is forecast to increase by USD 11.24 billion at a CAGR of 29% between 2024 and 2029.
The market is experiencing significant growth, driven by the increasing popularity of open knowledge networks and the rising demand for low-latency query processing. These trends reflect the growing importance of real-time data analytics and the need for more complex data relationships to be managed effectively. However, the market also faces challenges, including the lack of standardization and programming flexibility. These obstacles require innovative solutions from market participants to ensure interoperability and ease of use for businesses looking to adopt graph databases.
Companies seeking to capitalize on market opportunities must focus on addressing these challenges while also offering advanced features and strong performance to differentiate themselves. Effective navigation of these dynamics will be crucial for success in the evolving graph database landscape. Compliance requirements and data privacy regulations drive the need for security access control and data anonymization methods. Graph databases are deployed in both on-premises data centers and cloud regions, providing flexibility for businesses with varying IT infrastructures.
What will be the Size of the Graph Database Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample
In the dynamic market, security and data management are increasingly prioritized. Authorization mechanisms and encryption techniques ensure data access control and confidentiality. Query optimization strategies and indexing enhance query performance, while data anonymization methods protect sensitive information. Fault tolerance mechanisms and data governance frameworks maintain data availability and compliance with regulations. Data quality assessment and consistency checks address data integrity issues, and authentication protocols secure concurrent graph updates. This model is particularly well-suited for applications in social networks, recommendation engines, and business processes that require real-time analytics and visualization.
Graph database tuning and monitoring optimize hardware resource usage and detect performance bottlenecks. Data recovery procedures and replication methods ensure data availability during disasters and maintain data consistency. Data version control and concurrent graph updates address versioning and conflict resolution challenges. Data anomaly detection and consistency checks maintain data accuracy and reliability. Distributed transactions and data recovery procedures ensure data consistency across nodes in a distributed graph database system.
How is this Graph Database Industry segmented?
The graph database industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
End-user
Large enterprises
SMEs
Type
RDF
LPG
Solution
Native graph database
Knowledge graph engines
Graph processing engines
Graph extension
Geography
North America
US
Canada
Europe
France
Germany
Italy
Spain
UK
APAC
China
India
Japan
Rest of World (ROW)
By End-user Insights
The Large enterprises segment is estimated to witness significant growth during the forecast period. In today's business landscape, large enterprises are turning to graph databases to manage intricate data relationships and improve decision-making processes. Graph databases offer unique advantages over traditional relational databases, enabling superior agility in modeling and querying interconnected data. These systems are particularly valuable for applications such as fraud detection, supply chain optimization, customer 360 views, and network analysis. Graph databases provide the scalability and performance required to handle large, dynamic datasets and uncover hidden patterns and insights in real time. Their support for advanced analytics and AI-driven applications further bolsters their role in enterprise digital transformation strategies. Additionally, their flexibility and integration capabilities make them well-suited for deployment in hybrid and multi-cloud environments.
Graph databases offer various features that cater to diverse business needs. Data lineage tracking ensures accountability and transparency, while graph analytics engines provide advanced insights. Graph database benchmarking helps organizations evaluate performance, and relationship property indexing streamlines data access. Node relationship management facilitates complex data modeling, an
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All Employees: Professional and Business Services: Management of Companies and Enterprises in Rhode Island was 10.20000 Thous. of Persons in June of 2025, according to the United States Federal Reserve. Historically, All Employees: Professional and Business Services: Management of Companies and Enterprises in Rhode Island reached a record high of 14.20000 in August of 2017 and a record low of 9.10000 in May of 2009. Trading Economics provides the current actual value, an historical data chart and related indicators for All Employees: Professional and Business Services: Management of Companies and Enterprises in Rhode Island - last updated from the United States Federal Reserve on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All Employees: Professional and Business Services: Management of Companies and Enterprises in Connecticut was 30.80000 Thous. of Persons in March of 2025, according to the United States Federal Reserve. Historically, All Employees: Professional and Business Services: Management of Companies and Enterprises in Connecticut reached a record high of 34.10000 in June of 2016 and a record low of 26.40000 in January of 2007. Trading Economics provides the current actual value, an historical data chart and related indicators for All Employees: Professional and Business Services: Management of Companies and Enterprises in Connecticut - last updated from the United States Federal Reserve on July of 2025.
https://www.thebusinessresearchcompany.com/privacy-policyhttps://www.thebusinessresearchcompany.com/privacy-policy
The Global Graph Database market size is estimated to reach $9.59 billion by 2029 at 24.2%, and is projected to grow demand for personalized marketing is driving the growth of the graph database market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Real Value Added by Industry: Professional and Business Services: Management of Companies and Enterprises was 539.50000 Bil. of Chn. 2009 $ in January of 2025, according to the United States Federal Reserve. Historically, United States - Real Value Added by Industry: Professional and Business Services: Management of Companies and Enterprises reached a record high of 548.90000 in January of 2024 and a record low of 238.50000 in January of 2009. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Real Value Added by Industry: Professional and Business Services: Management of Companies and Enterprises - last updated from the United States Federal Reserve on July of 2025.
Business-to-Business marketers are looking to simplify data work-flows and increase data portability across their technology stacks. Developed off 180byTwo’s AccountLink™ B2B graph, AI, and DAAS solutions; Unifi enables B2B and Account-Based Marketers to seamlessly execute and measure marketing programs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This horizontal bar chart displays companies by company type using the aggregation count in Jacksonville Beach. The data is about companies.
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.
Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.
Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.
Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:
Background and Motivation
In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.
While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.
In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.
However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.
Source Code and Tutorial:
https://github.com/llcresearch/CompanyKG2
Paper: to be published