100+ datasets found

i
Homogeneous and Heterogeneous dataset for change detection
ieee-dataport.org
Updated Apr 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Jimenez Sierra (2022). Homogeneous and Heterogeneous dataset for change detection [Dataset]. https://ieee-dataport.org/documents/homogeneous-and-heterogeneous-dataset-change-detection
Explore at:
Dataset updated
Apr 20, 2022
Authors
David Jimenez Sierra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
fire
i
Heterogeneous Datasets for TinyUStaging
ieee-dataport.org
Updated Sep 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingyi Lu (2022). Heterogeneous Datasets for TinyUStaging [Dataset]. https://ieee-dataport.org/documents/heterogeneous-datasets-tinyustaging
Explore at:
Dataset updated
Sep 23, 2022
Authors
Jingyi Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nowadays
CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...
zenodo.org
application/gzip, bin +1
Updated Jun 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. http://doi.org/10.5281/zenodo.11391315
Explore at:
application/gzip, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11391315
Dataset updated
Jun 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha
Time period covered
May 29, 2024
Description
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

Background and Motivation

In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

Source Code and Tutorial:
https://github.com/llcresearch/CompanyKG2

Paper: to be published
Enhanced Stock Price Prediction with Optimized Ensemble Modeling Using...
figshare.com
xlsx
Updated Nov 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hongjiu Liu (2024). Enhanced Stock Price Prediction with Optimized Ensemble Modeling Using Multi-source Heterogeneous Data [Dataset]. http://doi.org/10.6084/m9.figshare.27328590.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27328590.v2
Dataset updated
Nov 5, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Hongjiu Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
the dataset can used for the test of models of deep learning which include structured data: stock price and unstructured data: stock bar posts. so, the dataset is Multi-source Heterogeneous Data.
Z
MAG for Heterogeneous Graph Learning
data.niaid.nih.gov
Updated Jul 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diea, Maria-Alexandra (2021). MAG for Heterogeneous Graph Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5055135
Explore at:
Dataset updated
Jul 9, 2021
Dataset authored and provided by
Diea, Maria-Alexandra
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.

MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.

Nodes and features We define the following nodes:

paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);

field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);

affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);

venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).

Edges We define the following edges:

author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);

author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);

paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);

paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);

field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);

We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.

Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:

{target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }

We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.
Data from: Ensemble Learning for Multi-type Classification in Heterogeneous...
figshare.com
zip
Updated Jan 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Serafino; Gianvito Pio (2018). Ensemble Learning for Multi-type Classification in Heterogeneous Networks [Dataset]. http://doi.org/10.6084/m9.figshare.4334048.v7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4334048.v7
Dataset updated
Jan 30, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Francesco Serafino; Gianvito Pio
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Ensemble Learning for Multi-type Classification in Heterogeneous NetworksIn this project you can find the following files:a) EnsembleMRSBC.zipThis file contains the systems (Mr-SBC, ST-MrSBC and MT-MrSBC), the datasets used for the experimental evaluation (they are dump databases generated with PostgreSQL 9.5) and, for each dataset, the 10 folds used for the 10-fold cross validation. Moreover, an example of configuration file for the execution of the system is included in the zip file.b) README.txtThis file contains the full instructions for the execution of the system.c) Results_EnsembleMT-MrSBC.xlsThis Excel file contains the results in terms of accuracy obtained on all datasets (according to the selected target types and their target attributes) by the systems: Mr-SBC, ST-MrSBC, MT-MrSBC (Lexicographic ordering), MT-MrSBC (Random ordering), RelIBk (RelWEKA), RelSMO (RelWEKA), HENPC and GNetMine. Results are reported for each fold and for each iteration in the case of our ensemble-based systems ST-MrSBC and MT-MrSBC (both Lexicographic and Random versions).For more details, please refer to the manuscript:F. Serafino, G. Pio, M. Ceci, "Ensemble Learning for Multi-type Classification in Heterogeneous Networks"
Z
Heterogeneous/Homogeneous Change Detection dataset
data.niaid.nih.gov
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hernán Darío Benítez Restrepo (2023). Heterogeneous/Homogeneous Change Detection dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8269854
Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
David Alejandro Jimenez Sierra
Hernán Darío Benítez Restrepo
Juan Felipe Florez Ospina
Behnood Rasti
Joceyn Chanussot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"Please if you use this datasets we appreciated that you reference this repository and cite the works related that made possible the generation of this dataset." This change detection datastet has different events, satellites, resolutions and includes both homogeneous/heterogeneous cases. The main idea of the dataset is to bring a benchmark on semantic change detection in remote sensing field.This dataset is the outcome of the following publications:

@article{ JimenezSierra2022graph,author={Jimenez-Sierra, David Alejandro and Quintero-Olaya, David Alfredo and Alvear-Mu{~n}oz, Juan Carlos and Ben{\'i}tez-Restrepo, Hern{\'a}n Dar{\'i}o and Florez-Ospina, Juan Felipe and Chanussot, Jocelyn},journal={IEEE Transactions on Geoscience and Remote Sensing},title={Graph Learning Based on Signal Smoothness Representation for Homogeneous and Heterogeneous Change Detection},year={2022},volume={60},number={},pages={1-16},doi={10.1109/TGRS.2022.3168126}} @article{ JimenezSierra2020graph,title={Graph-Based Data Fusion Applied to: Change Detection and Biomass Estimation in Rice Crops},author={Jimenez-Sierra, David Alejandro and Ben{\'i}tez-Restrepo, Hern{\'a}n Dar{\'i}o and Vargas-Cardona, Hern{\'a}n Dar{\'i}o and Chanussot, Jocelyn},journal={Remote Sensing},volume={12},number={17},pages={2683},year={2020},publisher={Multidisciplinary Digital Publishing Institute},doi={10.3390/rs12172683}} @inproceedings{jimenez2021blue,title={Blue noise sampling and Nystrom extension for graph based change detection},author={Jimenez-Sierra, David Alejandro and Ben{\'\i}tez-Restrepo, Hern{\'a}n Dar{\'\i}o and Arce, Gonzalo R and Florez-Ospina, Juan F},booktitle={2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS},ages={2895--2898},year={2021},organization={IEEE},doi={10.1109/IGARSS47720.2021.9555107}} @article{florez2023exploiting,title={Exploiting variational inequalities for generalized change detection on graphs},author={Florez-Ospina, Juan F and Jimenez Sierra, David A and Benitez-Restrepo, Hernan D and Arce, Gonzalo},journal={IEEE Transactions on Geoscience and Remote Sensing}, year={2023},volume={61},number={},pages={1-16},doi={10.1109/TGRS.2023.3322377}} @article{florez2023exploitingxiv,title={Exploiting variational inequalities for generalized change detection on graphs},author={Florez-Ospina, Juan F. and Jimenez-Sierra, David A. and Benitez-Restrepo, Hernan D. and Arce, Gonzalo R},year={2023},publisher={TechRxiv},doi={10.36227/techrxiv.23295866.v1}} In the table on the html file (dataset_table.html) are tabulated all the metadata and details related to each case within the dasetet. The cases with a link, were gathered from those sources and authors, therefore you should refer to their work as well. The rest of the cases or events (without a link), were obtained through the use of open sources such as:

Copernicus European Space Agency Alaska Satellite Facility (Vertex) Earth Data In addition, we carried out all the processing of the images by using the SNAP toolbox from the European Space Agency. This proccessing involves the following:

Data co-registration Cropping Apply Orbit (for SAR data) Calibration (for SAR data) Speckle Filter (for SAR data) Terrain Correction (for SAR data) Lastly, the ground truth was obtained from homogeneous images for pre/post events by drawing polygons to highlight the areas where a visible change was present. The images where layout and synchorized to be zoomed over the same are to have a better view of changes. This was an exhaustive work in order to be precise as possible.Feel free to improve and contribute to this dataset.
Heterogeneous Software Size Measurement Dataset and Model Evaluation Results...
zenodo.org
csv
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hüseyin Ünlü; Hüseyin Ünlü; Samet Tenekeci; Samet Tenekeci (2025). Heterogeneous Software Size Measurement Dataset and Model Evaluation Results [Dataset]. http://doi.org/10.5281/zenodo.15469884
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15469884
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hüseyin Ünlü; Hüseyin Ünlü; Samet Tenekeci; Samet Tenekeci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 20, 2025
Description
The heterogeneous dataset contains 2041 use case descriptions from 26 different software specifications documents written in English. Each requirement is manually measured by domain experts using COSMIC Function Point (CFP) and MicroM metrics. The extended results include evaluations with six metrics: MAE, NMAE, MSE, MMRE, PRED(30), and exact-match accuracy.
f
Clustering performance on benchmark datasets.
plos.figshare.com
xls
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiyang Sun; Fumiyasu Komaki (2025). Clustering performance on benchmark datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0326756.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326756.t002
Dataset updated
Jul 1, 2025
Dataset provided by
PLOS ONE
Authors
Xiyang Sun; Fumiyasu Komaki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Graph neural networks (GNNs) have shown great promise for representation learning on complex graph-structured data, but existing models often fall short when applied to directed heterogeneous graphs. In this study, we proposed a novel embedding method, a bidirectional heterogeneous graph neural network with random teleport (BHGNN-RT) that leverages the bidirectional message-passing process and network heterogeneity, for directed heterogeneous graphs. Our method captures both incoming and outgoing message flows, integrates heterogeneous edge types through relation-specific transformations, and introduces a teleportation mechanism to mitigate the oversmoothing effect in deep GNNs. Extensive experiments were conducted on various datasets to verify the efficacy and efficiency of BHGNN-RT. BHGNN-RT consistently outperforms state-of-the-art baselines, achieving up to 11.5% improvement in classification accuracy and 19.3% in entity clustering. Additional analyses confirm that optimizing message components, model layer and teleportation proportion further enhances the model performance. These results demonstrate the effectiveness and robustness of BHGNN-RT in capturing structural, directional information in directed heterogeneous graphs.
u
Data from: SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music...
investigacion.ujaen.es
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Garcia-Martinez, Jaime; Diaz-Guerra, David; Politis, Archontis; Virtanen, Tuomas; Carabias-Orti, Julio J.; Vera-Candeas, Pedro; Garcia-Martinez, Jaime; Diaz-Guerra, David; Politis, Archontis; Virtanen, Tuomas; Carabias-Orti, Julio J.; Vera-Candeas, Pedro (2024). SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation [Dataset]. https://investigacion.ujaen.es/documentos/67321d1aaea56d4af04840e5
Explore at:
Dataset updated
2024
Authors
Garcia-Martinez, Jaime; Diaz-Guerra, David; Politis, Archontis; Virtanen, Tuomas; Carabias-Orti, Julio J.; Vera-Candeas, Pedro; Garcia-Martinez, Jaime; Diaz-Guerra, David; Politis, Archontis; Virtanen, Tuomas; Carabias-Orti, Julio J.; Vera-Candeas, Pedro
Description
The SynthSOD dataset contains more than 47 hours of multitrack music obtained by synthesizing orchestra and ensemble pieces from the Symbolic Orchestral Database (SOD) using Spitfire BBC Symphony Orchestra Professional Library. To synthesize the MIDI files from the SOD, we needed to fix the original files into the General MIDI standard, select a subsect of files that fitted into our requirements (e.g., containing only instruments that we could synthesize), and develop a new system to generate musically-motivated random annotations about tempo, dynamic, and articulation. The code to replicate this process is available in our repository and all the details can be read in our paper. We have also published the code to train and evaluate the baseline and the pre-trained models in a GitHub repository.

We have also published the aligned score information for most of the pieces here.
OAG Dataset for H2GB
kaggle.com
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junhong Lin (2024). OAG Dataset for H2GB [Dataset]. https://www.kaggle.com/datasets/junhonglin/oag-dataset-for-h2gb/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Junhong Lin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
oag-cs, oag-eng, oag-chem are new heterogeneous networks composed of subsets of the Open Academic Graph (OAG). Each of the datasets contains papers from three different subject domains -- computer science, engineering, and chemistry. These datasets also contain four types of entities -- papers, authors, institutions, and fields of study. Each paper is associated with a 768-dimensional feature vector generated from a pre-trained XLNet applying on the paper titles. The representation of each word in the title are weighted by each word's attention to get the title representation for each paper. Each paper node is labeled with its published venue (paper or conference). We split the papers published up to 2016 as the training set, papers published in 2017 as the validation set, and papers published in 2018 and 2019 as the test set. The publication year of each paper is also included in these datasets. This means those datasets can also be converted to use the publication year as class labels.
Data from: Heterogeneous Multi-Source Data Fusion Through Input Mapping And...
zenodo.org
bin, csv
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yigitcan Comlek; Yigitcan Comlek; Sandipp Krishnan Ravi; Sandipp Krishnan Ravi; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen (2025). Heterogeneous Multi-Source Data Fusion Through Input Mapping And Latent Variable Gaussian Process [Dataset]. http://doi.org/10.5281/zenodo.14681801
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14681801
Dataset updated
Jan 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yigitcan Comlek; Yigitcan Comlek; Sandipp Krishnan Ravi; Sandipp Krishnan Ravi; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the data used for “Heterogeneous Multi-Source Data Fusion Through Input Mapping And Latent Variable Gaussian Process” paper by Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, and Wei Chen. For all correspondence, please contact Dr. Wei Chen (weichen@northwestern.edu) or Dr. Sandipp Krishnan Ravi (sandippk@umich.edu).

Please use the below BibTex format to cite this work:

@article{comlek2024heterogenous,

title={Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process},

author={Comlek, Yigitcan and Ravi, Sandipp Krishnan and Pandita, Piyush and Ghosh, Sayan and Wang, Liping and Chen, Wei},

journal={arXiv preprint arXiv:2407.11268},

year={2024}

}

The repository consists of data used in three case studies. All the data available is in .csv format. Each csv file contains the data for the specific source used in the case study. Below is a summary of the files for each of the three case studies.

Case Study 1 (Cantilever Beam)

· Source1_RectangularBeam.csv

· Source2_RectangularHollowBeam.csv

· Source3_CircularHollowBeam.csv

Case Study 2 (Ellipsoidal Void)

· Source1_2DEllipse.csv

· Source2_3DEllipse.csv

· Source3_3DEllipseRot.csv

Case Study 3 (Ti6AlV Alloys)

· Source1_LBPF.csv [1,2]

· Source2_EBM.csv [3]

· Source3_FSW.csv [4]

For this case study the data is collected from the below papers:

[1] Q. Luo, L. Yin, T. W. Simpson, and A. M. Beese, “Effect of processing parameters on pore structures, grain features, and mechanical properties in ti-6al-4v by laser powder bed fusion,” Additive Manufacturing, vol. 56, p. 102 915, 2022.

[2] Q. Luo, L. Yin, T. W. Simpson, and A. M. Beese, “Dataset of process-structure-property feature relationship for laser powder bed fusion additive manufactured ti-6al-4v material.,” Data in Brief, vol. 46, p. 108 911, 2023.

[3] J. Ran, F. Jiang, X. Sun, Z. Chen, C. Tian, and H. Zhao, “Microstructure and mechanical properties of ti-6al-4v fabricated by electron beam melting,” Crystals, vol. 10, no. 11, p. 972, 2020.

[4] A. Fall, M. Jahazi, A. Khdabandeh, and M. Fesharaki, “Effect of process parameters on microstructure and mechanical properties of friction stir-welded ti–6al–4v joints,” The International Journal of Advanced Manufacturing Technology, vol. 91, pp. 2919–2931, 2017
d
Multiple Kernel Learning based Heterogeneous Algorithm
catalog.data.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Aug 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Multiple Kernel Learning based Heterogeneous Algorithm [Dataset]. https://catalog.data.gov/dataset/multiple-kernel-learning-based-heterogeneous-algorithm
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
Dashlink
Description
Paper on this topic has been submitted to KDD 2010.
i
muxGNN Heterogeneous Graphs
ieee-dataport.org
Updated Mar 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua Melton (2023). muxGNN Heterogeneous Graphs [Dataset]. https://ieee-dataport.org/documents/muxgnn-heterogeneous-graphs
Explore at:
Dataset updated
Mar 18, 2023
Authors
Joshua Melton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Link prediction and graph classification datasets for heterogeneous graphs in DGL format
B
Heterogeneous graphs and Graph Neural Networks
borealisdata.ca
Updated Apr 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Topps; Corey Wirun; Rachel Ellaway (2022). Heterogeneous graphs and Graph Neural Networks [Dataset]. http://doi.org/10.5683/SP3/HQCC0D
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/HQCC0D
Dataset updated
Apr 22, 2022
Dataset provided by
Borealis
Authors
David Topps; Corey Wirun; Rachel Ellaway
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In exploring some of the concepts around Directed Acyclic Graphs and OLab in the assessment of clinical decision making, we have been juggling the ideas around layered and interconnected DAGs. Some of these explorations led us to the concept of heterogeneous graphs
H
Heterogeneous Flooring Report
datainsightsmarket.com
doc, pdf, ppt
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Heterogeneous Flooring Report [Dataset]. https://www.datainsightsmarket.com/reports/heterogeneous-flooring-1815886
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Feb 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global heterogeneous flooring market is anticipated to register a CAGR of 8.5% during the forecast period, from 2025 to 2033. The market is expected to witness robust demand from the commercial and residential flooring segments. The increasing construction activities in emerging economies and renovation & remodeling projects in developed economies are expected to fuel the market growth. The growing adoption of sustainable and eco-friendly flooring solutions is also expected to drive the demand for heterogeneous flooring. Key trends that are shaping the heterogeneous flooring market include the rise of e-commerce, the growing popularity of luxury vinyl tiles (LVTs), and the increasing focus on sustainability. E-commerce is making it easier for consumers to purchase flooring products, which is expected to boost the market growth. LVTs are gaining popularity due to their durability, water resistance, and ease of installation. The growing focus on sustainability is driving the demand for flooring products that are made from recycled materials and are eco-friendly.
d
Data from: Carrying capacity in a heterogeneous environment with habitat...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Carrying capacity in a heterogeneous environment with habitat connectivity [Dataset]. https://catalog.data.gov/dataset/carrying-capacity-in-a-heterogeneous-environment-with-habitat-connectivity
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The data are population sizes of yeast Saccharaomyces cerevisiae growth in laboratory cultures over a period of several days with different levels of growth inhibitor cycloheximide. Our results provide rigorous experimental tests of new and old theory, demonstrating how the traditional notion of carrying capacity is ambiguous for populations diffusing in spatially heterogeneous environments.
s
Global Heterogeneous Network Market Size, Share, Growth Analysis - Industry...
skyquestt.com
Updated Jan 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SkyQuest Technology (2025). Global Heterogeneous Network Market Size, Share, Growth Analysis - Industry Forecast 2024-2031 [Dataset]. https://www.skyquestt.com/report/heterogeneous-network-market
Explore at:
Dataset updated
Jan 11, 2025
Dataset authored and provided by
SkyQuest Technology
License
https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/
Time period covered
2023 - 2030
Area covered
Global
Description
Global Heterogeneous Network Market size was valued at USD 28.66 billion in 2021 and is poised to grow from USD 32.53 billion in 2022 to USD 101.6 billion by 2030, growing at a CAGR of 13.49% in the forecast period (2023-2030).
D
Heterogeneous Networks Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Heterogeneous Networks Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-heterogeneous-networks-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Sep 22, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Heterogeneous Networks Market Outlook

The global heterogeneous networks market size was valued at approximately $15 billion in 2023 and is projected to reach around $37 billion by 2032, growing at a compound annual growth rate (CAGR) of 10.8% during the forecast period. The primary growth factor for this market is the increasing demand for high-speed internet and improved network coverage, driven by the rapid proliferation of connected devices and the expansion of smart city initiatives worldwide.

The growth of the heterogeneous networks market is significantly influenced by the escalating need for enhanced data capacity and coverage. With the exponential growth in mobile data traffic, largely fueled by the adoption of smartphones, tablets, and other connected devices, traditional cellular networks are struggling to meet the demands. Heterogeneous networks, which combine various types of network technologies such as small cells, Wi-Fi, and macro cells, provide a viable solution to address these challenges by offering seamless connectivity and increased data throughput.

Another major growth factor for the heterogeneous networks market is the advancement in wireless communication technologies, particularly the deployment of 5G networks. 5G technology promises to deliver faster data speeds, lower latency, and more reliable connections, which are essential for supporting the growing number of Internet of Things (IoT) devices and applications. The integration of heterogeneous networks with 5G infrastructure is expected to enhance network performance and coverage, thereby driving the market growth.

Additionally, the market is being propelled by the increasing investments in smart cities and smart infrastructure projects. Governments and municipalities around the world are investing heavily in smart city initiatives to improve urban living conditions and enhance the efficiency of public services. Heterogeneous networks play a crucial role in these projects by providing the necessary connectivity for smart devices and applications, such as smart lighting, traffic management systems, and surveillance cameras, thus driving the market expansion.

From a regional perspective, the Asia Pacific region is anticipated to witness the highest growth in the heterogeneous networks market during the forecast period. This growth can be attributed to the rapid urbanization, increasing population, and the rising adoption of smart devices in countries like China, India, and Japan. In addition, significant investments in infrastructure development and the rollout of 5G networks in these countries are expected to further boost the demand for heterogeneous networks in the region.

Component Analysis

In the heterogeneous networks market, the component segment is broadly categorized into hardware, software, and services. The hardware segment includes various physical devices and equipment such as small cells, macro cells, distributed antenna systems (DAS), and Wi-Fi access points, which form the backbone of heterogeneous networks. The growth of this segment is driven by the increasing deployment of small cells and DAS to enhance network capacity and coverage in urban and densely populated areas. Moreover, the rising adoption of 5G technology is further boosting the demand for advanced hardware components capable of supporting higher data speeds and lower latency.

The software segment encompasses various network management and optimization software solutions that enable seamless integration and coordination of different network technologies. These solutions play a critical role in ensuring efficient network performance, minimizing interference, and optimizing resource allocation. The growing complexity of heterogeneous networks necessitates advanced software solutions to manage and control the network infrastructure effectively. Consequently, the software segment is expected to experience robust growth during the forecast period, driven by the increasing need for efficient network management and optimization.

Services in the heterogeneous networks market include planning, deployment, maintenance, and managed services offered by network service providers and system integrators. As the deployment of heterogeneous networks involves significant technical expertise and resources, the demand for professional services is on the rise. Network operators and enterprises are increasingly relying on service providers for the design and implementation of their network infrastructure, as well as for ongoing maintenance and support. This trend is expected to drive the g
t
Heterogeneous Integration Global Market Report 2025
thebusinessresearchcompany.com
pdf,excel,csv,ppt
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Business Research Company (2025). Heterogeneous Integration Global Market Report 2025 [Dataset]. https://www.thebusinessresearchcompany.com/report/heterogeneous-integration-global-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Mar 25, 2025
Dataset authored and provided by
The Business Research Company
License
https://www.thebusinessresearchcompany.com/privacy-policyhttps://www.thebusinessresearchcompany.com/privacy-policy
Description
Global Heterogeneous Integration market size is expected to reach $3.01 billion by 2029 at 30.7%, surge in iot adoption fueling the growth of the market due to increased demand for connectivity and automation

Facebook

Twitter

Click to copy link

Link copied

Cite

David Jimenez Sierra (2022). Homogeneous and Heterogeneous dataset for change detection [Dataset]. https://ieee-dataport.org/documents/homogeneous-and-heterogeneous-dataset-change-detection

Homogeneous and Heterogeneous dataset for change detection

Explore at:

Dataset updated

Apr 20, 2022

Authors

David Jimenez Sierra

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

fire

Clear search

Close search

Google apps

Main menu

Homogeneous and Heterogeneous dataset for change detection

Heterogeneous Datasets for TinyUStaging

CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

Enhanced Stock Price Prediction with Optimized Ensemble Modeling Using...

MAG for Heterogeneous Graph Learning

Data from: Ensemble Learning for Multi-type Classification in Heterogeneous...

Heterogeneous/Homogeneous Change Detection dataset

Heterogeneous Software Size Measurement Dataset and Model Evaluation Results...

Clustering performance on benchmark datasets.

Data from: SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music...

OAG Dataset for H2GB

Data from: Heterogeneous Multi-Source Data Fusion Through Input Mapping And...

Multiple Kernel Learning based Heterogeneous Algorithm

muxGNN Heterogeneous Graphs

Heterogeneous graphs and Graph Neural Networks

Heterogeneous Flooring Report

Data from: Carrying capacity in a heterogeneous environment with habitat...

Global Heterogeneous Network Market Size, Share, Growth Analysis - Industry...

Heterogeneous Networks Market Report | Global Forecast From 2025 To 2033

Heterogeneous Networks Market Outlook

Component Analysis

Heterogeneous Integration Global Market Report 2025

Homogeneous and Heterogeneous dataset for change detection