Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
fire
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nowadays
A popular dataset for node classification on heterogeneous graphs.
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.
Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.
Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.
Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:
Background and Motivation
In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.
While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.
In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.
However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.
Source Code and Tutorial:
https://github.com/llcresearch/CompanyKG2
Paper: to be published
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"Please if you use this datasets we appreciated that you reference this repository and cite the works related that made possible the generation of this dataset." This change detection datastet has different events, satellites, resolutions and includes both homogeneous/heterogeneous cases. The main idea of the dataset is to bring a benchmark on semantic change detection in remote sensing field.This dataset is the outcome of the following publications:
@article{ JimenezSierra2022graph,author={Jimenez-Sierra, David Alejandro and Quintero-Olaya, David Alfredo and Alvear-Mu{~n}oz, Juan Carlos and Ben{\'i}tez-Restrepo, Hern{\'a}n Dar{\'i}o and Florez-Ospina, Juan Felipe and Chanussot, Jocelyn},journal={IEEE Transactions on Geoscience and Remote Sensing},title={Graph Learning Based on Signal Smoothness Representation for Homogeneous and Heterogeneous Change Detection},year={2022},volume={60},number={},pages={1-16},doi={10.1109/TGRS.2022.3168126}} @article{ JimenezSierra2020graph,title={Graph-Based Data Fusion Applied to: Change Detection and Biomass Estimation in Rice Crops},author={Jimenez-Sierra, David Alejandro and Ben{\'i}tez-Restrepo, Hern{\'a}n Dar{\'i}o and Vargas-Cardona, Hern{\'a}n Dar{\'i}o and Chanussot, Jocelyn},journal={Remote Sensing},volume={12},number={17},pages={2683},year={2020},publisher={Multidisciplinary Digital Publishing Institute},doi={10.3390/rs12172683}} @inproceedings{jimenez2021blue,title={Blue noise sampling and Nystrom extension for graph based change detection},author={Jimenez-Sierra, David Alejandro and Ben{\'\i}tez-Restrepo, Hern{\'a}n Dar{\'\i}o and Arce, Gonzalo R and Florez-Ospina, Juan F},booktitle={2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS},ages={2895--2898},year={2021},organization={IEEE},doi={10.1109/IGARSS47720.2021.9555107}} @article{florez2023exploiting,title={Exploiting variational inequalities for generalized change detection on graphs},author={Florez-Ospina, Juan F and Jimenez Sierra, David A and Benitez-Restrepo, Hernan D and Arce, Gonzalo},journal={IEEE Transactions on Geoscience and Remote Sensing}, year={2023},volume={61},number={},pages={1-16},doi={10.1109/TGRS.2023.3322377}} @article{florez2023exploitingxiv,title={Exploiting variational inequalities for generalized change detection on graphs},author={Florez-Ospina, Juan F. and Jimenez-Sierra, David A. and Benitez-Restrepo, Hernan D. and Arce, Gonzalo R},year={2023},publisher={TechRxiv},doi={10.36227/techrxiv.23295866.v1}} In the table on the html file (dataset_table.html) are tabulated all the metadata and details related to each case within the dasetet. The cases with a link, were gathered from those sources and authors, therefore you should refer to their work as well. The rest of the cases or events (without a link), were obtained through the use of open sources such as:
Copernicus European Space Agency Alaska Satellite Facility (Vertex) Earth Data In addition, we carried out all the processing of the images by using the SNAP toolbox from the European Space Agency. This proccessing involves the following:
Data co-registration Cropping Apply Orbit (for SAR data) Calibration (for SAR data) Speckle Filter (for SAR data) Terrain Correction (for SAR data) Lastly, the ground truth was obtained from homogeneous images for pre/post events by drawing polygons to highlight the areas where a visible change was present. The images where layout and synchorized to be zoomed over the same are to have a better view of changes. This was an exhaustive work in order to be precise as possible.Feel free to improve and contribute to this dataset.
A popular dataset for node classification on heterogeneous graphs.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.
MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.
Nodes and features We define the following nodes:
paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);
author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);
field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);
affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);
venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).
Edges We define the following edges:
author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);
author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);
paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);
paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);
paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);
field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);
We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.
Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:
{target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }
We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.
https://www.datamintelligence.com/terms-conditionshttps://www.datamintelligence.com/terms-conditions
Heterogeneous Integration Market reached US$ 0.9 Billion in 2023 and is expected to reach US$ 10.2 Billion by 2031
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the dataset can used for the test of models of deep learning which include structured data: stock price and unstructured data: stock bar posts. so, the dataset is Multi-source Heterogeneous Data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Link prediction and graph classification datasets for heterogeneous graphs in DGL format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PRONTO heterogeneous benchmark dataset is based on an industrial-scale multiphase flow facility. It includes data from heterogeneous sources, including process measurements, alarm records, high frequency ultrasonic flow and pressure measurements, an operation log and video recordings. The study collected data from various operational conditions with and without induced faults to generate a multi-rate, multi-modal dataset. The dataset is suitable for developing and validating algorithms for fault detection and diagnosis (FDD) and data fusion.
When using the dataset please cite the following publication:
A. Stief, R. Tan, Y. Cao, J. R. Ottewill, N. F. Thornhill, J. Baranowski, A heterogeneous benchmark dataset for data analytics: Multiphase flow facility case study, Journal of Process Control, 79 (2019) 41–55, DOI: https://doi.org/10.1016/j.jprocont.2019.04.009
The dataset has been used in the following works:
A. Stief, R. Tan, Y. Cao, J. R. Ottewill. Analytics of heterogeneous process data: Multiphase flow facility case study. IFAC-PapersOnLine, 51(18):363–368, 2018. DOI: https://doi.org/10.1016/j.ifacol.2018.09.327
A. Stief, J. R. Ottewill, R. Tan, Y. Cao. Process and alarm data integration under a two-stage Bayesian framework for fault diagnostics. IFAC-PapersOnLine, 51(24):1220–1226, 2018. DOI: https://doi.org/10.1016/j.ifacol.2018.09.696
A. Stief, J. R. Ottewill, J. Baranowski. Investigation of the diagnostic properties of sensors and features in a multiphase flow facility case study. in: 12th IFAC Symposium on Dynamics and Control of Process Systems (in press), 2019
M. Lucke, X. Mei, A. Stief, M. Chioua, N. F. Thornhill. Variable selection for fault detection and identification based on mutual information of multi-valued alarm series, in: 12th IFAC Symposium on Dynamics and Control of Process Systems (in press), 2019
R. Tan, T. Cong, N. F. Thornhill, J. R. Ottewill, J. Baranowski. Statistical monitoring of processes with multiple operating modes, in: 12th IFAC Symposium on Dynamics and Control of Process Systems (in press), 2019.
A popular dataset for node classification on heterogeneous graphs.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data release provides the predictions from stream temperature models described in Chen et al. 2021. Briefly, various deep learning and process-guided deep learning models were built to test improved performance of stream temperature predictions below reservoirs in the Delaware River Basin. The spatial extent of predictions was restricted to streams above the Delaware River at Lordville, NY, and includes the West Branch of the Delaware River below Cannonsville Reservoir and the East Branch of the Delaware River below Pepacton Reservoir. Various model architectures, training schemes, and data assimilation methods were used to generate the table and figures in Chen et a.l (2021) and predictions of each model are captured in this release. For each model, there are test period predictions for 56 river reaches from 2006-10-01 through 2020-09-30. Model input and validation data can be found in Oliver et al. (2021).
The publication associated with this data rele ...
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global heterogeneous networks market size was valued at approximately $15 billion in 2023 and is projected to reach around $37 billion by 2032, growing at a compound annual growth rate (CAGR) of 10.8% during the forecast period. The primary growth factor for this market is the increasing demand for high-speed internet and improved network coverage, driven by the rapid proliferation of connected devices and the expansion of smart city initiatives worldwide.
The growth of the heterogeneous networks market is significantly influenced by the escalating need for enhanced data capacity and coverage. With the exponential growth in mobile data traffic, largely fueled by the adoption of smartphones, tablets, and other connected devices, traditional cellular networks are struggling to meet the demands. Heterogeneous networks, which combine various types of network technologies such as small cells, Wi-Fi, and macro cells, provide a viable solution to address these challenges by offering seamless connectivity and increased data throughput.
Another major growth factor for the heterogeneous networks market is the advancement in wireless communication technologies, particularly the deployment of 5G networks. 5G technology promises to deliver faster data speeds, lower latency, and more reliable connections, which are essential for supporting the growing number of Internet of Things (IoT) devices and applications. The integration of heterogeneous networks with 5G infrastructure is expected to enhance network performance and coverage, thereby driving the market growth.
Additionally, the market is being propelled by the increasing investments in smart cities and smart infrastructure projects. Governments and municipalities around the world are investing heavily in smart city initiatives to improve urban living conditions and enhance the efficiency of public services. Heterogeneous networks play a crucial role in these projects by providing the necessary connectivity for smart devices and applications, such as smart lighting, traffic management systems, and surveillance cameras, thus driving the market expansion.
From a regional perspective, the Asia Pacific region is anticipated to witness the highest growth in the heterogeneous networks market during the forecast period. This growth can be attributed to the rapid urbanization, increasing population, and the rising adoption of smart devices in countries like China, India, and Japan. In addition, significant investments in infrastructure development and the rollout of 5G networks in these countries are expected to further boost the demand for heterogeneous networks in the region.
In the heterogeneous networks market, the component segment is broadly categorized into hardware, software, and services. The hardware segment includes various physical devices and equipment such as small cells, macro cells, distributed antenna systems (DAS), and Wi-Fi access points, which form the backbone of heterogeneous networks. The growth of this segment is driven by the increasing deployment of small cells and DAS to enhance network capacity and coverage in urban and densely populated areas. Moreover, the rising adoption of 5G technology is further boosting the demand for advanced hardware components capable of supporting higher data speeds and lower latency.
The software segment encompasses various network management and optimization software solutions that enable seamless integration and coordination of different network technologies. These solutions play a critical role in ensuring efficient network performance, minimizing interference, and optimizing resource allocation. The growing complexity of heterogeneous networks necessitates advanced software solutions to manage and control the network infrastructure effectively. Consequently, the software segment is expected to experience robust growth during the forecast period, driven by the increasing need for efficient network management and optimization.
Services in the heterogeneous networks market include planning, deployment, maintenance, and managed services offered by network service providers and system integrators. As the deployment of heterogeneous networks involves significant technical expertise and resources, the demand for professional services is on the rise. Network operators and enterprises are increasingly relying on service providers for the design and implementation of their network infrastructure, as well as for ongoing maintenance and support. This trend is expected to drive the g
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Risk Commodity Detection Dataset (RCDD) is a large-scale heterogeneous e-commerce network from Datasets and Interfaces for Benchmarking Heterogeneous Graph Neural Networks paper. It was extracted from Alibaba's e-commerce platform based on a real risk detection scenario. It consists of 13,806,619 nodes and 157,814,864 edges across 7 node types and 7 edge types, respectively. Risk commodities always deliberately disguise risk information, for example, forging "innocent" relationships by forging devices, addresses, or other methods. Due to the sensitivities of the scenario, for confidentiality and security, the names of the other node types (e.g., buyer and seller) and edge types (e.g., buy and sell) are redacted and represented by letters. Each commodity is associated with a 256-dimensional feature vector concatenated by the image and text features extracted from pre-trained models, BERT and BYOL, respectively. Each item node is labeled with a binary label tagging whether is a risk commodity or not. We follow the official dataset splitting, where the test set is obtained over time.
Paper on this topic has been submitted to KDD 2010.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Recent advances in data-driven approaches using the machine learning (ML) method have enabled the discovery of high-performance materials. This paper presents a hybrid framework that combines ML models with a metaheuristic optimization algorithm, to explore improved heterogeneous catalysts for propane dehydrogenation (PDH). The framework proposes multiple PDH catalysts, utilizing our laboratory-scale database. A unique five-component catalyst, 2.4Ga 2.2Pt 1.7B 1.3Zr/Al2O3, exhibits superior performance, achieving a propylene yield of 58% at 600 °C. This work highlights the excellent predictive capability of the framework and offers a new data-driven approach for developing high-performance materials for heterogeneous catalysis.
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Gain in-depth insights into Heterogeneous Flooring Market Report from Market Research Intellect, valued at USD 12.5 billion in 2024, and projected to grow to USD 20.3 billion by 2033 with a CAGR of 7.2% from 2026 to 2033.
In exploring some of the concepts around Directed Acyclic Graphs and OLab in the assessment of clinical decision making, we have been juggling the ideas around layered and interconnected DAGs. Some of these explorations led us to the concept of heterogeneous graphs
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global Heterogeneous Parameter Server market is anticipated to reach a value of $1212 million by 2033, expanding at a CAGR of XX% from 2025 to 2033. The growing demand for high-performance computing in various industries, such as artificial intelligence, machine learning, and data analytics, is fueling the market growth. Additionally, the increasing adoption of cloud computing and the need for efficient parameter management in distributed systems are contributing to the market expansion. Some of the key trends shaping the Heterogeneous Parameter Server market include the rise of edge computing, the integration of artificial intelligence and machine learning, and the growing emphasis on data security. Key regions driving the market growth include North America, Europe, and Asia Pacific. The presence of major technology companies and the high demand for high-performance computing in these regions are contributing to the market expansion. Prominent companies operating in the Heterogeneous Parameter Server market include IBM, Google, AWS, Microsoft Innovation, xFusion Digital Technologies, and Huawei, among others.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
fire